May 25, 2016

Major Hayden

Test Fedora 24 Beta in an OpenStack cloud

Although there are a few weeks remaining before Fedora 24 is released, you can test out the Fedora 24 Beta release today! This is a great way to get a sneak peek at new features and help find bugs that still need a fix.
Fedora Infinity Logo
The Fedora Cloud image is available for download from your favorite local mirror or directly from Fedora’s servers. In this post, I’ll show you how to import this image into an OpenStack environment and begin testing Fedora 24 Beta.

One last thing: this is beta software. It has been reliable for me so far, but your experience may vary. I would recommend waiting for the final release before deploying any mission critical applications on it.

Importing the image

The older glance client (version 1) allows you to import an image from a URL that is reachable from your OpenStack environment. This is helpful since my OpenStack cloud has a much faster connection to the internet (1 Gbps) than my home does (~ 20 mbps upload speed). However, the functionality to import from a URL was removed in version 2 of the glance client. The OpenStackClient doesn’t offer the feature either.

There are two options here:

  • Install an older version of the glance client
  • Use Horizon (the web dashboard)

Getting an older version of glance client installed is challenging. The OpenStack requirements file for the liberty release leaves the version of glance client without a maximum version cap and it’s difficult to get all of the dependencies in order to make the older glance client work.

Let’s use Horizon instead so we can get back to the reason for the post.

Adding an image in Horizon

Log into the Horizon panel and click Compute > Images. Click + Create Image at the top right of the page and a new window should appear. Add this information in the window:

  • Name: Fedora 24 Cloud Beta
  • Image Source: Image Location
  • Image Location:
  • Format: QCOW2 – QEMU Emulator
  • Copy Data: ensure the box is checked

When you’re finished, the window should look like this:

Adding Fedora 24 Beta image in Horizon

Click Create Image and the images listing should show Saving for a short period of time. Once it switches to Active, you’re ready to build an instance.

Building the instance

Since we’re already in Horizon, we can finish out the build process there.

On the image listing page, find the row with the image we just uploaded and click Launch Instance on the right side. A new window will appear. The Image Name drop down should already have the Fedora 24 Beta image selected. From here, just choose an instance name, select a security group and keypair (on the Access & Security tab), and a network (on the Networking tab). Be sure to choose a flavor that has some available storage as well (m1.tiny is not enough).

Click Launch and wait for the instance to boot.

Once the instance build has finished, you can connect to the instance over ssh as the fedora user. If your security group allows the connection and your keypair was configured correctly, you should be inside your new Fedora 24 Beta instance!

Not sure what to do next? Here are some suggestions:

  • Update all packages and reboot (to ensure that you are testing the latest updates)
  • Install some familiar applications and verify that they work properly
  • Test out your existing automation or configuration management tools
  • Open bug tickets!

The post Test Fedora 24 Beta in an OpenStack cloud appeared first on

by Major Hayden at May 25, 2016 03:17 AM

Rob Hirschfeld

Open Source as Reality TV and Burning Data Centers [gcOnDemand podcast notes]

During the OpenStack summit, Eric Wright (@discoposse) and I talked about a wide range of topics from scoring success of OpenStack early goals to burning down traditional data centers.

Why burn down your data center (and move to public cloud)? Because your ops process are too hard to change. Rob talks about how hybrid provides a path if we can made ops more composable.

Here are my notes from the audio podcast (source):

1:30 Why “zehicle” as a handle? Portmanteau from electrics cars… zero + vehicle

Let’s talk about OpenStack & Cloud…

  • OpenStack History
    • 2:15 Rob’s OpenStack history from Dell and Hyperscale
    • 3:20 Early thoughts of a Cloud API that could be reused
    • 3:40 The practical danger of Vendor lock-in
    • 4:30 How we implemented “no main corporate owner” by choice
  • About the Open in OpenStack
    • 5:20 Rob decomposes what “open” means because there are multiple meanings
    • 6:10 Price of having all open tools for “always open” choice and process
    • 7:10 Observation that OpenStack values having open over delivering product
    • 8:15 Community is great but a trade off. We prioritize it over implementation.
  • Q: 9:10 What if we started later? Would Docker make an impact?
    • Part of challenge for OpenStack was teaching vendors & corporate consumers “how to open source”
  • Q: 10:40 Did we accomplish what we wanted from the first summit?
    • Mixed results – some things we exceeded (like growing community) while some are behind (product adoption & interoperability).
  • 13:30 Interop, Refstack and Defcore Challenges. Rob is disappointed on interop based on implementations.
  • Q: 15:00 Who completes with OpenStack?
    • There are real alternatives. APIs do not matter as much as we thought.
    • 15:50 OpenStack vendor support is powerful
  • Q: 16:20 What makes OpenStack successful?
    • Big tent confuses the ecosystem & push the goal posts out
    • “Big community” is not a good definition of success for the project.
  • 18:10 Reality TV of open source – people like watching train wrecks
  • 18:45 Hybrid is the reality for IT users
  • 20:10 We have a need to define core and focus on composability. Rob has been focused on the link between hybrid and composability.
  • 22:10 Rob’s preference is that OpenStack would be smaller. Big tent is really ecosystem projects and we want that ecosystem to be multi-cloud.

Now, about RackN, bare metal, Crowbar and Digital Rebar….

  • 23:30 (re)Intro
  • 24:30 VC market is not metal friendly even though everything runs on metal!
  • 25:00 Lack of consistency translates into lack of shared ops
  • 25:30 Crowbar was an MVP – the key is to understand what we learned from it
  • 26:00 Digital Rebar started with composability and focus on operations
  • 27:00 What is hybrid now? Not just private to public.
  • 30:00 How do we make infrastructure not matter? Multi-dimensional hybrid.
  • 31:00 Digital Rebar is orchestration for composable infrastructure.
  • Q: 31:40 Do people get it?
    • Yes. Automation is moving to hybrid devops – “ops is ops” and it should not matter if it’s cloud or metal.
  • 32:15 “I don’t want to burn down my data center” – can you bring cloud ops to my private data center?

by Rob H at May 25, 2016 12:45 AM

May 24, 2016

The Official Rackspace Blog

As Demand for Bare Metal Increases, Rackspace’s OpenStack Portfolio Delivers

The surging interest in bare metal at last month’s OpenStack Summit in Austin might have been a surprise to some, but the signs were there.

The keynote for the Ironic Project, which provides a number of key features required for managing bare metal nodes in a cloud, was one of the most buzzed about talks at the Summit, while the 2016 OpenStack User Survey found bare metal is now the third most intriguing new technology, with fully half of users expressing strong interest — up from interest so slight just six months ago it fell into the “Other” category, representing just 4 percent of respondents’ interest.

<figure class="wp-caption aligncenter" id="attachment_40671" style="width: 620px">OpenStack User Survey<figcaption class="wp-caption-text">Source: OpenStack User Survey, April 2016</figcaption></figure>

But it wasn’t a surprise to OpenStack experts at Rackspace. We see the rising interest as a natural progression in the needs and use cases for OpenStack users, as well as a general maturation of the market.

After all, bare metal offers faster speeds and greater performance consistency, without the overhead and complexity of the hypervisor. And as cloud environments and IT operations become more complex, businesses need a simpler way to scale and manage their growth.

(Register now for our live May 25 webinar “Unraveling Infrastructure Complexities with Bare Metal”)

At the same time, as workloads have become more demanding and require more computation capabilities, users need to be able to rely upon consistent performance and greater raw power in their cloud environment.

Increased interest in bare metal, which in many cases offers solutions to both of these challenges, indicates that the market is ready for more.

For years, our customers have been asking us to let them have their cake and eat it, too. They want the instant deployment and API accessibility of public cloud. But they also want the raw power, consistent performance, and cost efficiency at scale of dedicated servers.

Our answer came in July 2014, when we launched OnMetal Cloud Servers, the bare metal offering within our broader set of OpenStack solutions — private, public and hybrid clouds across traditional virtualized, container-based and bare metal environments, all using a standard set of APIs to help customers simplify their experience.

As I’ve written previously, OnMetal Cloud Servers offer all the capabilities customers have been demanding: the power, consistent performance and cost efficiency at scale of dedicated servers, plus the instant-on deployment and API accessibility of the public cloud.

Earlier this year, we launched the second generation of OnMetal Cloud Servers, which provide innovative hybrid connectivity, improved storage and international availability.

All of these capabilities make OnMetal Cloud Servers an excellent choice for highly demanding and intensive workloads such as Cassandra, Docker and Spark.

Indeed, customers such as Brigade, a startup building tools for people to express their civic identity, learn about their friends and neighbors and work toward common goals together, use OnMetal Cloud Servers to provide the speed, stability and reliability required to power its omnichannel app. Brigade’s query times have decreased more than 97 percent, from 7.3 seconds to 200 milliseconds, since moving to OnMetal Cloud Servers.

If your organization is looking for a solution that combines bare metal speed, consistent performance and instant scalability, register today to participate in our May 25 live webinar, “Unraveling Infrastructure Complexities with Bare Metal.”

Source: OpenStack User Survey, April 2016,, accessed 2016-05-12

The post As Demand for Bare Metal Increases, Rackspace’s OpenStack Portfolio Delivers appeared first on The Official Rackspace Blog.

by Jason Barnhill at May 24, 2016 03:00 PM

Ronald Bradford

Understanding the Oslo Libraries

Underpinning all of the OpenStack projects including Nova, Cinder, Keystone, Glance, Horizon, Heat, Trove, Murano and others is a set of core common libraries that provide a consistent, highly tested and compatible feature set. The Oslo project is a collection of over 30 libraries that are designed to reduce the technical debt of code duplication across projects and provide for a greater quality code path due to the frequency of use in OpenStack projects.

These libraries provide a variety of different features from the more commonly used functionality found in projects including configuration, logging, caching, messaging and database management to more specific features like deprecation management, handling plugins as well as frameworks for command line programs and state machines. The Oslo Python libraries are designed to be Python 2.7 and Python 3.4 compatible, leading the way in migration towards Python 3.

The first stable Oslo library oslo.config was included in the Grizzly release. Now over 30 libraries comprise the Oslo project. These libraries fall into a number of broad categories.

1. Stable OpenStack specific libraries

These libraries, using the olso. prefix are generally well described the library name.

  • oslo.cache
  • oslo.concurrency
  • oslo.context
  • oslo.config
  • oslo.db
  • oslo.i18n
  • oslo.log
  • oslo.messaging
  • oslo.middleware
  • oslo.policy
  • oslo.privsep
  • oslo.reports
  • oslo.serialization
  • oslo.service
  • oslo.utils
  • oslo.versionedobjects
  • oslo.vmware

2. Python libraries that can easily operate with other projects

In addition to the oslo namespace libraries, Oslo has a number of generically named libraries that are not OpenStack specific. The goal is that these libraries can be utilized outside of OpenStack by any Python project. These include:

  • automaton – a framework for building state machines.
  • cliff – a framework for building command line programs.
  • debtcollector – a collection of python patterns that help you collect your technical debt in a non-destructive manner (following deprecation patterns and strategies and so-on).
  • futurist – a collection of async functionality and additions from the future.
  • osprofiler – an OpenStack cross-project profiling library.
  • hacking – a library that provides a set of tools for enforcing coding style guidelines.
  • pbr – (or Python Build Reasonableness) is a add-on library that helps provide (and enforce) a set of sensible default setuptools behaviours.
  • pyCADF – a python implementation of the DMTF Cloud Audit (CADF) data model.
  • stevedore – a library for managing plugins for Python applications.
  • taskflow – a library that helps create applications that handle state/failures… in a reasonable manner.
  • tooz – a library that aims at centralizing the most common distributed primitives like group membership protocol, lock service and leader election

3. Convenience libraries

There are also several libraries that are used during the creation of, or support of OpenStack libraries.

The first was oslo-incubator where as the name suggests, initial libraries were incubated. As this code matured it was refactored into standard libraries. Projects have either graduated, been incorporated elsewhere or been deprecated. While the Oslo Incubator has been removed of libraries in Mitaka, one of the goals of the Newton cycle is to see the adoption of Oslo libraries in all projects. We will be providing a series of blogs to detail the walkthrough and reviews of existing projects for reference.

Other libraries include:

  • oslosphinx is a sphinx add-on library that provides theme and extension support for generating documentation with Sphinx. The Developer Documentation, Release Notes, a number of the OpenStack manuals including the Configuration Reference and now the Nova API Reference rely on this library.

  • oslotest is a helper library that provides base classes and fixtures for creating unit and functional tests.
  • oslo-cookiecutter is a project that creates a skeleton Oslo library from a set of templates.

4. Proposed or deprecated libraries

Some libraries fall outside of these categories, such as oslo.rootwrap. This was a mature library for handling fine filtering of shell commands to run as root. This is now deprecated in favor of oslo.privsep which is a mechanism for running selected python code with elevated privileges.

pylockfile is a legacy (and adopted) inter-process lock management library that was never used within OpenStack.

The oslo.version is an example of a proposed library at present to help in using python metadata to determine versioning.

The Oslo team is also evaluating what other common code may be suitable for an Oslo library.

The meaning behind the Oslo Name

Each OpenStack project has some reason behind the name. Oslo is in reference to the Oslo Peace Accords and “bringing peace” to the OpenStack project.

Oslo is also the capital of Norway, and in Norway you can find Moose. The moose is our project mascot.

by ronald at May 24, 2016 01:41 PM


[Event Report] RDO Bug Triage Day on 18th and 19th May, 2016

As the RDO development continues to grow, the numbers of bugs also grow. To make the community robust and heavy, it is also necessary to triage and fix the existing bugs so that product will be more valuable.

We had hosted RDO Bug Triage Day on 18th and 19th May, 2016.

The main aim of the triage day was :

  • Mass Closing of "End of Life" Bugs through automation with proper EOL statement. On 19th May, 2016 RDO meeting, after evaluating the Fedora EOL statement, we came up with this : "This bug is against a Version which has reached End of Life. If it's still present in supported release (, please update Version and reopen." Thanks to Rich Bowen for proposing it,
  • Analysis the bug and assign proper component, target-release and ask for more information if the bug is incomplete
  • Close the Bug if it is already fixed in the current release
  • Provide patches to the existing Bugs.

Those 2 days was awesome for RDO.

Here is some of the stats of RDO Bug Triage Day:

  • 433 Bugs affected on Bug Triage Day.
  • 398 Bugs closed on Bug Triage Day out of which 365 EOL Bugs closed by automation script.
  • 35 Bugs triaged on Bug Triage Day and most of the EOL triaged bugs closed due to EOL automation script
  • 22 people participated in the RDO Bug Triage Day.

Thanks to Pradeep Kilambi, Peter Nordquist, Javier Pena, Ivan Chavero, Matthias Runge, Suraj Narwade, Christopher Brown, Dan Prince, Mike Burns, Dmitry Tantsur, Alfredo Moralejo, Alan Pevec, Miroslav Suchy, Masanari Matsuoka, Garth Mollett, Giulio Fidente, David Moreau Simard, Chandan Kumar and Emilien Macchi for participating and making the RDO Bug triage day successful.

The above stats is generated using script.

by chandankumar at May 24, 2016 12:59 PM


OpenStack-based apps: The technology is easy, it’s the culture that’s hard

The post OpenStack-based apps: The technology is easy, it’s the culture that’s hard appeared first on Mirantis | The Pure Play OpenStack Company.

Want to hear more from Craig on App development? Join us for Accelerating App Development with Continuous Integration Kubernetes on OpenStack.

At the OpenStack summit last month in Austin, Mirantis’ Craig Peters and the OpenStack Foundation’s David Flanders got together to discuss application development on OpenStack with Superuser.TV.

They talked about:

  • The importance of the developer experience
  • Use cases for OpenStack: what can you actually DO with it?
  • Bridging mobile and traditional applications
  • Enterprise applications and how to take advantage of existing transactional systems
  • How a tiger team can “infect” the rest of the organization with the knowledge of how the cloud brings value
  • The convergence of clouds and containers
  • Why users need an application catalog to find resources at the right level — and how it shows patterns of use
  • The fact that security still matters and why Craig thinks some apps will be in VMs for the rest of our lives
  • What NOT to do with your cloud infrastructure
  • Making developers support their code in production, and why that’s a Good Thing
  • What CIOs should be thinking about when it comes to application development on OpenStack
  • And of course, Skynet.

Watch the whole video here:

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>

The post OpenStack-based apps: The technology is easy, it’s the culture that’s hard appeared first on Mirantis | The Pure Play OpenStack Company.

by Nick Chase at May 24, 2016 04:01 AM

May 23, 2016

OpenStack Superuser

How Burton Snowboards is carving down the OpenStack trail

Burton snowboards has always sailed down the road less traveled. In 1977 Jake Burton Carpenter built the first snowboard factory in Burlington, Vermont. Now the world’s largest snowboarding brand, they also peddle bindings, boots, outerwear and accessories.

To succeed, they’ve always followed the “show, don’t tell” philosophy. In the 80s, Carpenter and his wife Donna were the original brand ambassadors, swooshing down the pistes of Austria to gain traction with adrenaline-seeking skiers looking for something new. Now the company produces a number of video series, including Burton Presents, which clocks hundreds of thousands of YouTube views. The series showcases top riders as they perform chin-dropping tricks off remote mountainsides with original films that run from 7-12 minutes. Burton also produces Burton Girls, shorter how-tos, and captures the excitement at numerous events around the world every year.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="" src="" width=""></iframe>

Turns out those gravity-defying videos are heavy business, at least when it comes to storage. In addition to the finished products, there are countless hours of raw footage that need to be stored and archived yet available to remix for future videos. In a talk at the recent Austin Summit, Jim Merritt, senior systems administrator, says the towering mountain of video footage that required safekeeping was putting a cramp the data center.

“Big data is any amount of data that becomes a problem for you or your business,” he says. “If it’s problematic, it’s big data.” All of that video data has tremendous marketing value, he adds, so it needs to be kept and accessible for future use. In the 36-minute talk, he shares how Burton discovered OpenStack Swift object storage as a solution and the criteria that went into choosing to deploy it for both immediate and future needs.

Here’s what the set-up at Burton looked like before the storage revamp:

alt text here

Merritt drew on six years of experience in the genomics field, where he had to deal with petabyte-scale data management, to find a solution. One useful concept from the previous work was to categorize the data into raw and intermediate data, raw data being that initial data that cannot be recreated any other way.

alt text here

“When I’m doing data protection or disaster recovery, I can forgo the intermediate data. I can recreate that from the raw,” Merritt says. “It may take me a day, it may take me an hour, it may take me take a month, but I can recreate it in a disaster.” Here’s a snapshot of what their infrastructure looks like now:

alt text here

alt text here

“All this took a bit of a paradigm shift for us,” Merritt says. “Before this, data was a wild west, we had data going everywhere.”

The lesson learned, he added was that Burton had to step back and look at their data and come up with a data structure that was more appropriate yet let the end users have easy access to it. A year after the switch, there were a few adjustments that the team made but for the most part "it just works," he says.

You can catch the entire 36-minute talk on the OpenStack Summit site.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="" src="" width=""></iframe>

by Nicole Martinelli at May 23, 2016 08:45 PM

OpenStack and Kubernetes join forces for an Internet of Things platform

This post explains the open source IoT platform introduced at OpenStack Summit Keynote at Austin in more detail. First, we explain our approach and vision about IoT, then offer technical overview and show two sample use cases.

<iframe allowfullscreen="" class="object-responsive" frameborder="0" height="" scrolling="no" src="" width=""></iframe>

Internet of Things (IoT) is “the next big thing” in cloud computing era. Leading industry vendors present their solution for IoT and interpret them to fit their business strategy. For this reason IoT is abused as a new buzzword for vendor proprietary business solutions. Term IoT can mean almost everything and it even less specific than cloud computing services. The Internet of Things revolves around increased machine-to-machine communication, it is built on networks of data-gathering sensors and actuators connected to cloud computing services that process all the information. It is going to make everything in our lives from streetlights to seaports “smart.”

At tcp cloud we look at IoT in different way than other vendors. In same way we deliver private cloud solutions to our customer, using existing open source projects and extending our cloud service approach to create universal IoT platform, that can handle multiple use cases. We have defined following requirements:

Open source software Whole platform must be based on existing open source solutions and must not be developed by a single vendor. We wanted to make use of existing platforms: OpenStack, Kubernetes, Docker, OpenContrail, etc.
HW and vendor independence No vendor lock-in on both software and hardware side. IoT gateway CPU must have either x86/64 or ARM architecture. We do not want to be locked to any vendor with expensive proprietary appliances.
Interoperable IoT platform must be universal and usable for multiple use cases. For instance IoT gateways can be used in the street lamps for counting objects the same way as in the smart factory or industry 4.0 application.

Therefore we designed following high level architecture, which uses open source projects OpenStack, Kubernetes, OpenContrail and Docker.


Open source software
Whole platform must be based on existing open source solutions and must not be developed by a single vendor. We wanted to make use of existing platforms: OpenStack, Kubernetes, Docker, OpenContrail, etc.

HW and vendor independence
No vendor lock-in on both software and hardware side. IoT gateway CPU must have either x86/64 or ARM architecture. We do not want to be locked to any vendor with expensive proprietary appliances.

IoT platform must be universal and usable for multiple use cases. For instance IoT gateways can be used in the street lamps for counting objects the same way as in the smart factory or industry 4.0 application.

More technical details are covered in section Technology Overview. Let’s have a look at first two Use Cases, where we started prototyping the solution.

Smart City Prototype

The first use case is SmartCity project of a small city Pisek located in Czech Republic. The SmarCity concept and architecture will deploy over 3,000 endpoints and approximately 300 IoT gateways that run in high-vailability mode in Kubernetes driven containers. Part of the solution is an open data portal and data API available for third party companies that provides information about:

  • Traffic flow, routing, parking
  • Monitoring, management, energy saving
  • E-commerce, marketing, tourist infromation
  • Environmental analysis
  • Lifestyle, social services, social networks


Target solution uses IoT gateways based on RaspberryPi 2 serving as IoT gateways. Data from gateways is stored in Graphite and processed by custom datamining applications and results are displayed in City citizens portal based on Leonardo CMS, which is web service allowing mixing complex visualizations with arbitrary content. This Open Data portal enables data access through dashboard visualizations or API.

The following screen shows sample output from crossroad Kollarova x Zizkova with Vehicle and Pedestrian passages for specific period.


You can read more about this project in “A step forward in making cities smarter” at Superuser Magazine from Austin or following presentation from KubeCon 2016.

Smart Conference at OpenStack Summit Austin

To proof that our IoT platform is really independent on application environment, we took one IoT gateway (RaspberryPi 2) from the city project and put into Austin Convention Center during OpenStack Summit together with IQRF based mesh network connecting sensors that measure humidity, temperature and CO2 levels. This demonstrates ability that IoT gateway can manage or collect data from any technology like IQRF, Bluetooth, GPIO, and any other communication standard supported on Linux based platforms.

We deployed 20 sensors and 20 routers on 3 conference floors with a single active IoT gateway receiving data from entire IQRF mesh network and relaying it to dedicated time-series database, in this case Graphite. Collector is MQQT-Java bridge running inside docker container managed by Kubernetes. The most interesting is distance between docker container on Raspberry in conference and virtual machine running in Europe data center. Dynamic network overlay tunnels are provided by OpenContrail SDN. Further explanation is covered in section Technical overview.


The following picture shows single wireless IQRF mesh network during sensors and routers discovery. Zones 0 - 1 cover conference floor 1 and 2 - 4 floor 4.


IQRF is a wireless mesh technology operating on sub-gigahertz ISM bands. It provides very easy integration, product interoperability, robust mesh networking with maximum of 240 hops, range of hundreds of meters and ultra-low power operation.

The following screenshot shows real time CO2 values from different rooms on 2 floors. Historical graph shows values from Monday. You can easily recognize when the main keynote session started and when was the lunch period.


For Austin data collecting, following schema covers services


Technology Overview

This part further explains our technical concept for IoT platform. The IoT platform was created with general vision, which is to collect, manage and process data from thousands of endpoints securely and dynamically with centralized management.

Therefore architecture is divided into two main parts:


Datacenter is central point for management of entire IoT platform. There OpenStack IaaS cloud with virtual machines is running along cloud and SDN control planes. These machines contain time-series storages, data processing clusters, data API proxy access, visualization web services, etc.


IoT Gateways are located at any target place like street lamps, factory machines, home appliances, etc. SDN provides transport layer connecting remote IoT gateways with cloud services. Gateways can be multiplatform, it is possible to mix x86/64 and ARM devices today. It is possible to host multiple sensor platforms for multiple customers on single gateway because of micro services segmentation (Docker containers) and Kubernetes multi-tenancy support. The platform is able to provide scalable multi-tenant space where applications and sensors are on same network regardless the distance.

The following schema shows datacenter layers and components on gateway side. The section Detail schema shows deeper information.


Detail Schema

Detail schema provides logical view on architecture side of whole IoT platform. Left side shows datacenter and right side gateway explained in the previous section.

As you can see below OpenStack is used as cloud for hosting all control services as well as all big data processing and frontend visualisation units. Kubernetes on gateways is used for micro segmentation of services necessary for multi-tenancy and security between different sensors. OpenContrail is used to connect both sides and provide network segmentation between Kubernetes PODs and OpenStack Project VMs.


As already mentioned segmentation is done by SDN overlays. The important is only IP connectivity between datacenter edge router and IoT gateway. Lowest layer is VPN between gateway gateway OS and Edge Router in DC. Next layer is SDN, where OpenContrail enables direct communication between VM (OpenStack Cloud) and Container(Gateway). This approach allows to choose from variety of sensors and actuators, privilege them and securely connect with processing applications inside of cloud.

Datacenter contains following services:

Management Services
HW cluster running VMs with all control services: OpenStack controller, OpenContrail controller (SDN), Kubernetes master, Salt Master.

OpenStack Cloud
OpenStack projects provides segmentation for different virtual machine services like databases (graphite, influxdb, openTSDB), big data processing (Hadoop) and data visualization (Grafana, LeonardoCMS). It runs on KVM hypervisors and uses OpenContrail neutron plugin for networking.

Edge Routers
OpenContrail creates iBGP peering with datacenter Edge routers, where propagates dynamically network routes from both OpenStack VMs and Kubernetes pods on IoT gateways. It creates standard L3VPN as MPLSoverGRE or MPLSoverUDP.

Remote gateways contain components:

Kubernetes Minion
Kubernetes minion communicates with Kubernetes master in datacenter and manages PODs by kubelet. Kubelet uses opencontrail plugin, which connects docker containers with vRouter agent.

Kubernetes PODs
Kubernetes PODs are single or multiple docker containers connected to vRouter. PODs are segmented by labels. This enable to start different application, which can read from different message bus as IQRF, Bluetooth or GPIO.

Docker Containers
Docker containers in Kubernetes PODs brigs great benefit of having any kind of operation system easily without any special installation. For instance IQRF uses simple java application with specific version, which can be delivered by container in several minutes and do not mismatch operation system of gateway itself.

Application View

The following schema provides explanation of application view. This shows that VMs inside of OpenStack cloud can reach Docker container in any geographic location in L2 or L3 private network thanks to OpenContrail overlay. Therefore application developers can use same tool as they uses in standard cloud. They can deploy VM application controller by HEAT and then Kubernetes services in containers on remote gateways by simple yaml files.


For instance we gather data from environmental sensors. Sensor is directly connected to container where data is processed in Docker container and send them to Graphite time series database. Because we want to present data graphically and realtime we use another VM with Leonardo CMS which reads from Graphite APIs and displays data on website. Accordingly to this we can create different projects on same principles in same cloud with multiple inputs and outputs.


We tried to briefly explain our vision and prototype deployments of tcp cloud IoT platform. Currently we are working on detail design for whole Smart City solution.

We got a great response from community after having life showcases at OpenStack Summit in Austin and KubeCon in London this year. Our concept seems to be accepted as a possible way of dealing with security, resiliance and performance concerns of IoT platforms and lot of technology partners want to join us in our effort and extend our IoT platform to their solution. We are now starting on concept Industry 4.0 to create first Smart Factory based on open source projects.

If you want to see IoT in action, you can register for OpenStack Day Prague where we show again live presentation with our industry partners.

If you are interested in collaborating with us, get in touch or follow us on Twitter or read our blog.


This post first appeared on tcp cloud's blog. Superuser is always interested in community content, email:

Cover Photo // CC BY NC

by Jakub Pavlik at May 23, 2016 06:39 PM


RDO Blogs: week of May 23, 2016

Here's what RDO engineers have been blogging about over the last week:

Connecting another vm to your tripleo-quickstart deployment by Lars Kellogg-Stedman

Let's say that you have set up an environment using tripleo-quickstart and you would like to add another virtual machine to the mix that has both "external" connectivity ("external" in quotes because I am using it in the same way as the quickstart does w/r/t the undercloud) and connectivity to the overcloud nodes. How would you go about setting that up?

…

Reproducing an Open vSwitch Bridge Configuration by Adam Young

In the previous post, I described the setup for installing FreeIPA on a VM parallel to the undercloud VM setup by Tripleo Quickstart. The network on the undercloud VM has been setup up by Ironic and Neutron to listen on a network defined for the overcloud. I want to reproduce this on a second machine that is not enrolled in the undercloud. How can I reproduce the steps?

…

ARA: An idea to store, browse and troubleshoot Ansible Playbook runs by David Moreau Simard

Ansible can be used for a lot of things and it’s grown pretty popular for managing servers and their configuration. In the RDO and OpenStack communities, Ansible is heavily used to deploy or test OpenStack through Continuous Integration (CI). Projects like TripleO-Quickstart, WeIRDO, OpenStack-Ansible or Zuul v3 are completely driven by Ansible.

…

by Rich Bowen at May 23, 2016 06:19 PM

Kenneth Hui

Benefits Of The Rackspace Approach To OpenStack

Quick message: Eric Siebert over at vSphere-land is running his annual vote for top virtualization blogs. This is the first year that I’ve put my blog up for consideration and would appreciate your vote if you’ve found this blog useful to you. Even if you are not inclined to vote for this blog as one of your top 12 blogs to read, I would still encourage you to go and vote for your favorites. Thank you.


As the co-founder and standard-bearer for OpenStack, Rackspace gets a lot of questions from users, journalists, analysts and vendors — about how we run OpenStack at scale, whether we use upstream code or have forked the project, and how we decide what code to contribute back.

Given that Rackspace runs the oldest and largest OpenStack public cloud in the world, was the first to offer OpenStack private cloud as a service and runs some of the largest private clouds in production, it’s important we address those questions.

In this post, I aim to do that by describing how we operate OpenStack in our public and private clouds, as well as the philosophy that guides our choices. I’ll explain how we decide which projects to include and what to contribute back to the community while running clouds that are hosting hundreds of thousands of instances.

Most importantly, I want to talk about how the approach Rackspace takes benefits both end users and the OpenStack community at large.

This post is a little longer blog than I usually write, but I believe it will be valuable to readers — so sit down with your caffeinated beverage of choice and get comfortable.

To read more about why I think OpenStack as a Service is the best consumption model, please click here to go to my article on the Rackspace blog site.

Filed under: Cloud, Cloud Computing, Open Source, OpenStack, Private Cloud, Public Cloud Tagged: Cloud, Cloud computing, Open Source, OpenStack, Private Cloud, Public Cloud, Rackspace

by kenhui at May 23, 2016 03:57 PM

OpenStack Superuser

How OpenStack public cloud + Cloud Foundry = a winning platform for telecoms

Swisscom has one of the largest in-production industry standard platform-as-a-service built on OpenStack.

Their offering focuses on providing an enterprise-grade PaaS environment to customers worldwide and with various delivery models based on Cloud Foundry and OpenStack. Swisscom, Switzerland’s leading telecom provider, embarked early on the OpenStack journey to deploy their app cloud partnering with Red Hat, Cloud Foundry and PLUMgrid.

Superuser interviewed Marcel Härry, chief architect, PaaS at Swisscom and member of the Technical Advisory Board of the Cloud Foundry Foundation to find out more.

How are you using OpenStack?

OpenStack has allowed us to rapidly develop and deploy our Cloud Foundry-based PaaS offering, as well as to rapidly develop new features within SDN and containers. OpenStack is the true enabler for rapid development and delivery.

An example: after half a year from the initial design and setup, we already delivered two production instances of our PaaS offering built on multiple OpenStack installations on different sites. Today we are already running multiple production deployments for high-profile customers, who further develop their SaaS offerings using our platform. Additionally, we are providing the infrastructure for numerous lab and development instances. These environments allow us to harden and stabilize new features while maintaining a rapid pace of innovation, while still ensuring a solid environment.

We are running numerous OpenStack stacks, all limited - by design - to a single region, and single availability zone. Their size ranges from a handful of compute nodes, to multiple dozens of compute nodes, scaled based on the needs of the specific workloads. Our intention is not to build overly large deployments, but rather to build multiple smaller stacks, hosting workloads that can be migrated between environments. These stacks are hosting thousands of VMs, which in turn are hosting tens of thousands of containers to run production applications or service instances for our customers.

What kinds of applications or workloads are you currently running on OpenStack?

We’ve been using OpenStack for almost three years now as our infrastructure orchestrator. Swisscom built its Elastic Cloud on top of OpenStack. On top of this we run Swisscom’s Application Cloud, or PaaS, built on Cloud Foundry with PLUMgrid as the SDN layer. Together, the company’s clouds deliver IaaS to IT architects, SaaS to end users and PaaS to app developers among other services and applications. We mainly run our PaaS/Cloud Foundry environment on OpenStack as well as the correlated managed services (i.e. a kind of DBaaS, Message Service aaS etc.) which are running themselves in Docker containers.

What challenges have you faced in your organization regarding OpenStack, and how did you overcome them?

The learning curve for OpenStack is pretty steep. When we started three years ago almost no reference architectures were available, especially none with enterprise-grade requirements such as dual-site, high availability (HA) capabilities on various levels and so forth. In addition, we went directly into the SDN, SDS levels of implementation which was a big, but very successful step at the end of the day.

What were your major milestones?

Swisscom’s go-live for its first beta environment was in spring of 2014, go live for an internal development (at Swisscom) was spring of 2015, and the go-live for its public Cloud Foundry environment fully hosted on OpenStack was in the fall of 2015. The go-live date for enterprise-grade and business-critical workloads on top of our stack from various multinational companies in verticals like finance or industry is spring, 2016, and Swisscom recently announced Swiss Re as one of its first large enterprise cloud customers.

What have been the biggest benefits to your organization as a result of using OpenStack?

Pluggability and multi-vendor interoperability (for instance with SDN like PLUMgrid or SDS like ScaleIO) to avoid vendor lock in and create a seamless system. OpenStack enabled Swisscom to experiment with deployments utilizing a DevOps model and environment to deploy and develop applications faster. It simplified the move from PoC to production environments and enabled us to easily scale out services utilizing a distributed cluster-based architecture.

What advice do you have for companies considering a move to OpenStack?

It’s hard in the beginning but it’s really worth it. Be wise when you select your partners and vendors, this will help you to be online in a very short amount of time. Think about driving your internal organization towards a dev-ops model to be ready for the first deployments, as well as enabling your firm to change deployment models (e.g. going cloud-native) for your workloads when needed.

How do you participate in the community?

This year’s Austin event was our second OpenStack Summit where we provided insights into our deployment and architecture, contributing back to the community in terms of best practices, as well as providing real-world production use-cases. Furthermore, we directly contribute patches and improvements to various OpenStack projects. Some of these patches have already been accepted, while a few are in the pipeline to be further polished for publishing. Additionally, we are working very closely together with our vendors - RedHat, EMC, ClusterHQ/Flocker, PLUMgrid as well as the Cloud Foundry Foundation - and work together to further improve their integration and stability within the OpenStack project. For example, we worked closely together with Flocker for their cinder-based driver to orchestrate persistency among containers. Furthermore, we have provided many bug reports through our vendors and have worked together with them on fixes which then have made their way back into the OpenStack community.

What’s next?

We have a perfect solution for non-persistent container workloads for our customers. We are constantly evolving this product and are working especially hard to meet the enterprise- and finance-verticals requirements when it comes to the infrastructure orchestration of OpenStack.

Härry spoke about OpenStack in production at the recent Austin Summit, along with Pere Monclus of PLUMgrid, Chip Childers of the Cloud Foundry Foundation, Chris Wright of Red Hat and analyst Rosalyn Roseboro. Catch the 40-minute session below.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="" src="" width=""></iframe>

by Superuser at May 23, 2016 02:55 PM

Hugh Blemings



Welcome to Last week on OpenStack Dev (“Lwood”) for the week just past. For more background on Lwood, please refer here.

Basic Stats for week 16 May to 22 May 2016 for openstack-dev:

  • ~584 Messages (down about 27% relative to last week)
  • ~194 Unique threads (down about 17% relative to last week)

After last week’s busiest week in Lwood history, a return to average traffic levels this week.  This week is the first where I’m actively keeping an eye on the rather quieter openstack-operators and openstack-community lists, not sure if this will be a long term change, we’ll see :)

Notable Discussions – openstack-dev

New API guidelines for review

Mike McCune writes that there are two new API guidelines ready for review by interested parties;

Request for Volunteer Trainers at PyCon Portland OR

David Flanders notes that the OpenStack Foundation has been given the opportunity to run a 90 minute training session for Application Developers at upcoming PyCon in Portland, OR.  As he rightly points “This is a great opportunity to road test the SDKs with our main user audience: application developers.”  If you’re interested in helping out, please contact David ASAP :)

A refresher on the global requirements process

Dims Srinivas provides a nice concise primer/reminder on how to work with the global requirements process as it currently stands and also notes there is a new team being formed to further streamline the process.

Languages vs. Scope of OpenStack (was The Monster Thread :)

In his initial post and a subsequent reply to the thread Thierry Carrez seeks to summarise the core issues brought to light by the recent thread on bringing golang in as a supported language for developing core OpenStack projects/code.  At the time of writing the thread is actually pretty short so you may want to read the various well thought through contributions yourself, but in essence;

Some projects in OpenStack are more low-level than others and require the sort of optimisation that can only be achieved in languages other than python.  It’s possibly helpful to think of language choice in these terms rather than the specific language itself.

A key question is where does OpenStack stop and the wider Open Source community start – it’s suggested that there’s a couple of ways to think of this;

  • The first way is community-centric: “anything produced by our community is OpenStack”
  • The other way is product-centric which leads to the idea that “lower-level pieces are OpenStack dependencies, rather than OpenStack itself”

Thierry posits that OpenStack dependencies can and should be developed in whatever language best suits the task at hand and so doing is relatively less costly from an OpenStack community standpoint.  Chris Dent notes that a similar way to make this distinction is whether the tool is useful and usable outside OpenStack.

Welcome Keystone to the World of Python 3

Morgan Fainberg notes with thanks to all involved that Keystone is now Python 3.4 compatible.  Nice work :)

Austin OpenStack Summit Wrapup – Part IV

No new posts with specific Summit wrap-ups in them but as mentioned last week I’ve now pulled together an as concise as I could make it summary of those posts in a blog post here. If there are further updates I’ll edit the post accordingly.

Notable Discussions – other OpenStack lists

As noted above, as of this week I’m trialing watching what’s happening on the openstack-operators and openstack-community lists as well…

Defining ‘users’, planning ops mid-cycles and related meetings

Over on the openstack-operators mailing list, Chris Morgan wrote a summary of one of the discussions at the Ops Meetup Team IRC meeting (!)  Of note and worth a quick read is the thoughtful definition of ‘users’ for the purposes of working out who should attend Operator Mid-Cycles.

In short the preference is for people involved in large scale public and private clouds to attend, more so than vendors of said clouds.  However individuals who work for large scale cloud vendors are encouraged to attend if they feel they can contribute, but are asked to wear their user rather than “promotional” hat (I paraphrase this latter).

In a related thread on openstack-operators Tom Fifield announces the meeting in question here and provides a neat summary a few days later in this post. The regular IRC meetings will occur every second Tyesday 1400h UTC in the #openstack-operators channel.

Update on Non-ATC Recognition

An email from Edgar Magana prompted the ever efficient Shamail Tahir to give a quick summary of where this process is up to.

I defer to Shamail’s email for the details but the desire to have a way to recognise contributors to OpenStack that don’t quite fit the Active Technical Contributor (ATC) definition has led to defining an Active User Contributor (AUC).  This process is ongoing but will provide a defined way of identifying folk that fit this mold and so their contribution to OpenStack more generally.

Upcoming OpenStack Events


Don’t forget the OpenStack Foundation’s comprehensive Events Page for a comprehensive list that is frequently updated.

People and Projects

Vulnerability Management Team changes

PTL/Core nominations & changes

Further Reading & Miscellanea

Don’t forget these excellent sources of OpenStack news – most recent ones linked in each case

This edition of Lwood brought to you by Nick Menza and OHM (Soultone Cymbals 10th Anniversary show, with condolences to Nick’s family, friends and fans), Robert Plant (Now and Zen), Rush (A Show of Hands) amongst other tunes.

by hugh at May 23, 2016 12:06 PM

Design Summit evolution, operating at scale, and more OpenStack news

Are you interested in keeping track of what is happening in the open source cloud? is your source for news in OpenStack, the open source cloud infrastructure project.

by Jason Baker at May 23, 2016 06:59 AM

May 21, 2016

David Moreau Simard

ARA: An idea to store, browse and troubleshoot Ansible Playbook runs

The context

Ansible can be used for a lot of things and it’s grown pretty popular for managing servers and their configuration.

In the RDO and OpenStack communities, Ansible is heavily used to deploy or test OpenStack through Continuous Integration (CI). Projects like TripleO-Quickstart, WeIRDO, OpenStack-Ansible or Zuul v3 are completely driven by Ansible.

In the world of automated continuous integration, it’s not uncommon to have hundreds, if not thousands of jobs running every day for testing, building, compiling, deploying and so on.

Keeping up with a large amount of Ansible runs and their outcome, not just in the context of CI, is challenging.

The idea

ARA is an idea I came up with to try and make Ansible runs easier to visualize, understand and troubleshoot.

ARA is three things:

  1. An Ansible callback plugin to record playbook runs into a local or remote database
  2. A CLI client to query the database
  3. A web interface to visualize the database

ARA organizes recorded playbook data in a way to make it intuitive for you to search and find what you’re interested for as fast and as easily as possible.

It provides summaries of task results per host or per playbook.

It allows you to filter task results by playbook, play, host, task or by the status of the task.

With ARA, you’re able to easily drill down from the summary view for the results you’re interested in, whether it’s a particular host or a specific task.

Beyond browsing a single ansible-playbook run, ARA supports recording and viewing multiple runs in the same database.

This allows you to, for example, recognize patterns (ex: this particular host is always failing this particular task) since you have access to data from multiple runs.

ARA is an open source project available on Github under the Apache v2 license. Documentation and frequently asked questions are available on

Why ?

As I mentioned before, the vast majority of the RDO CI is powered by Ansible. When a job build fails, I have to look at one of these Jenkins console logs that’s >8000 lines long. If it doesn’t crash my browser, I get to dig across the large amount of output to try and figure out what went wrong in the job build.

When you’re testing OpenStack trunk, you’re going to be troubleshooting a lot of those large failed jobs and it’s painful. Over time, I’ve (unfortunately) gotten used to it and got pretty good, actually. However, it still takes me a non negligible amount of time just to find where Ansible failed to know where to start searching for in the logs.

It’s also definitely a nightmare when someone else wants to look at the jobs to try and understand what happened.

ARA solves that painpoint - and many others - by making it easier to browse the results of a playbook.

Other attempts

To try and help us before ARA was born, we leveraged two callbacks to try and help us parse the Ansible Playbook output.

The first is which helps pretty-printing output from tasks like “command” or “yum”. We also have profile_tasks that is built-in and helps by showing how much time each task took.

These callbacks are definitely helpful for small playbooks or playbooks that contain small or short-running tasks. On long-running playbooks with a large amount of output, they almost make matters worse by adding even more output into the task results.

How do I get started with ARA ?

I’ve tried to do simple, yet effective documentation on how to get started with ARA.

1) Install ARA


First, you’ll need to install some packaged dependencies and then you can install ARA from source or from pip.

For example on a CentOS server:

yum -y install gcc python-devel libffi-devel openssl-devel
pip install ara

2) Configure the callback

Documentation: (What’s an Ansible Callback ?)

The configuration of the callback is simple and seemless. You want to add the following to your ansible.cfg file:

callback_plugins = /usr/lib/python2.7/site-packages/ara/callback

# Or, if using a virtual environment, for example

callback_plugins = $VIRTUAL_ENV/lib/python2.7/site-packages/ara/callback

3) Run a playbook with ansible-playbook

Run your favorite playbook !

4.1) Browse your data through the CLI


$ ara result list
| ID                                   | Host        | Task           | Status | Ignore Errors | Time Start                 | Time End                   |
| a73efa33-0d1e-4a7d-8e28-a76fa93b9377 | localhost   | Debug thing    | ok     | False         | 2016-05-21 14:42:24.794070 | 2016-05-21 14:42:24.837268 |

$ ara result show a73efa33-0d1e-4a7d-8e28-a76fa93b9377
| Field         | Value                                              |
| ID            | a73efa33-0d1e-4a7d-8e28-a76fa93b9377               |
| Host          | localhost                                          |
| Task          | Debug thing (d04a5828-d32f-4349-89f1-39d7400b328f) |
| Status        | ok                                                 |
| Ignore Errors | False                                              |
| Time Start    | 2016-05-21 14:42:24.794070                         |
| Time End      | 2016-05-21 14:42:24.837268                         |

4.2) Browse your data through the web interface

Documentation: (What does the web UI look like ?)

Fire off the bundled webserver:

$ ara-manage runserver
 * Running on (Press CTRL+C to quit)

And use your favorite browser.

There’s no step five !

We’re all done here. That’s the gist of it.

A lot of effort was made towards making ARA as simple to install, configure and use as possible. It is meant to be able to run from start to finish locally but it is also powerful enough if you’d like to aggregate runs into a central server.

Discussing or contributing to ARA

If you’d like to use ARA or contribute to the project, definitely feel free ! Feedback, comments, ideas and suggestions are quite welcomed as well.

I hang out in the ##ara channel on freenode IRC if you want to come chat about ARA.

(Yes, that’s two “#” while I discuss the recovery of “#ara” with the freenode staff)

Special thanks to Lars Kellogg-Stedman for the early feedback on the project, ideas and code contributions. He was very helpful in fleshing and maturing the idea into something better.

by dmsimard at May 21, 2016 07:00 PM


Why the world needs private clouds

The post Why the world needs private clouds appeared first on Mirantis | The Pure Play OpenStack Company.

Lois Lane won a Pulitzer in Superman Returns for her article “Why the World Doesn’t Need Superman” [spoiler alert] only to reverse her views, in a rather emotional moment, with the article “Why the World Needs Superman”.


Many industry pundits and analysts hold negative sentiments about private clouds. (In other words, “The World Doesn’t Need Private Clouds”.) In my opinion, these detractors will ultimately change their view, just like Lois Lane did.

Right off the bat I can think of 10 solid reasons why private clouds are here to stay. The $130B+ datacenter and virtualization infrastructure market is simply too large to jump onto one single bandwagon. This means the world will be a combination of public and private clouds, and here’s why.

Reason #1: Cost

Marketing 101, don’t lead with cost. Customer reality 101, it’s all about the cost. In private clouds with over 2,000 virtual machines, our customers are seeing a cost reduction of 40-60% as compared to public clouds (see example of a credit card company, media giant, enterprise CRM SaaS vendor). Another successful OpenStack user, Tubemogul, publicly claims a 30% savings and a reduction in server footprint by using OpenStack instead of AWS. These types of savings are also validated by the 451 Cloud Price Index. We are talking about real dollars saved that you can use for something else.

We recently launched a brand new AWS vs. OpenStack calculator. Check it out and see how much you could save.

Reason #2: Integration with On-prem Data

Unless you are a brand new startup, you probably have legacy systems that your new applications need to integrate with. Perhaps it’s a customer database, perhaps it’s an inventory system. Of course, it doesn’t have to be legacy data. It could be new IoT, analytics or other digital data you will be generating. Unless you plan to host all of this data on the public cloud, you probably need to consider a private cloud. Sure, you could purchase a direct link to a public cloud and host your apps there, but all this data transfer is likely to get very expensive if there is a lot of data going back and forth. One of our media customers chose a private cloud because one set of their apps needed access to an on-prem image database. They continue to use the public cloud for other apps.

Reason #3: Geographic Availability

Public clouds are not available everywhere in the world. If you operate in geographies without a public cloud and need to meet specific data residency laws, a private cloud may be the only option. One of the reasons why the Volkswagen group chose a private cloud is because they operate in 167 locations around the world with varying data residency policies.

Reason #4: Government Access

In a public cloud, governments can issue subpoenas and get access to your data (see AWS or Azure agreements). Further, see the recent fight between the government and Microsoft over secret demands for customer data where the government can make blind subpoena requests, and the owner of the data will never even know that their data was relinquished.

The plot thickens.

Even if your data is not subpoenaed, it could get turned over if you and the subpoenaed tenant are sharing the same hardware – a distinct possibility in a multitenant public cloud. Not to make your head hurt, but the status of metadata is not specified in the above agreements. For example, is the pattern of how many VMs you start and stop, rate of I/Os  to your storage volumes or buckets, and traffic in and out of your workload protected by public cloud providers’ privacy policy? Or is the public cloud vendor and their partners able to see that metadata? So, unless you are 100% comfortable with your data being turned over to the government and other gray areas in terms of privacy, private clouds might be something to consider.

A large SaaS customer of ours providing security related services felt there was no way they would move their company secrets and proprietary technology to the public cloud and instead chose a private cloud.

Reason #5: Compliance

If you are in highly regulated industries with strict compliance requirements, you might have to stick with a private cloud. Take the gaming industry for instance. In the US, a casino cannot host any of their games on a public cloud and that is why one of our customers is using a private cloud. Even without hard government restrictions, if you need things like detailed security logs in an event of a breach, you might not be able to get them from a public cloud provider. A large telco customer chose an OpenStack private cloud since since AWS was unwilling to provide access to all the information they needed to meet their compliance needs.

Reason #6: Long Term Business Continuity

Businesses come and go. High flying technology companies like DEC, Sun Microsystems, Palm, Blackberry, AOL were once infallible. Over a 5, 10, 15 year period, there is no guarantee that your public cloud provider will still be in business. The problem with technology is that a new innovation can literally blow incumbents out of the water. You are probably thinking, not my problem. I’ll be long gone and my successor can migrate all those workloads to another cloud. Easier said than done. If your organization puts petabytes (or zettabytes by then) of data into a public cloud, moving your workloads is not going to be fun. It will be a race against time as the end-of-life letter will have a finite timeframe at the end at which the cloud will be turned off. See Nirvanix’s cloud demise where customers were given just a couple of weeks. This is why research labs, governments and public institutions are adopting private clouds.

Reason #7: Vendor Lock-in

Even if you are not worried about long-term issues, you might want flexibility in moving your workloads between public clouds and possibly a private cloud. In this case, you don’t want to be locked into one particular cloud vendor’s APIs. By using a private cloud with open APIs or an open CMP (cloud management platform) you avoid vendor lock-in and can seamlessly move workloads to virtually any public cloud. OpenStack is yet to fully meet the promise of cloud portability, although it is on the roadmap.

Reason #8: Unique Technical or Business Requirements

If your team requires specific features not available in a public cloud, you will need to consider a private cloud. This factor might be obvious, but is worth stating anyway, because your requirements may be more “unique” than you think.

For example, if your team needs a specific FPGA adapter for deep learning, a server with specific network function virtualization (NFV) acceleration features, VM flavors not available on the public cloud, specific network traversal requirements or integration with a specific PaaS vendor you will need to create a private cloud.

Additionally, public cloud SLAs are anemic. They typically only provide availability SLAs, and the partial refunds for failing to meet them are not exactly thrilling. If your team needs better availability, data durability, I/O latency, performance or other SLAs; again you may have to run a private cloud.

Imagine you are an automobile company running apps to power autonomous cars. I can’t imagine risking the cars’ autonomous operation SLAs by running your apps on a public cloud. See a recent example of several businesses affected by public cloud failures. Similarly putting the primary copy of medical images or golden image of your blockbuster movie in a public cloud that lacks data durability SLAs, might raise concerns. In addition to performance SLAs, there may be unique requirements on support SLAs as well that could drive the need for a private cloud.

Reason #9: Being Measured on EBITDA

Most people love the fact that public clouds are accounted for as an operational expense rather than capital expense. However, if your are being measured on EBITDA (i.e. depreciation is not considered), a private cloud where you can capitalize hardware might be more attractive. Similarly, if you have a strict budgeting process the variable expense of a public cloud might be troublesome.

Reason #10: Bare Metal Workloads

Last but not least are bare metal workloads. Public clouds are very secure when it comes to virtual machines — arguably more secure than private clouds, though they also have a bigger target painted on their backs. But I would not risk putting bare metal workloads on a public cloud since that would be a hacker’s dream come true! If you want to run Docker containers, HPC, analytics or machine learning workloads directly on bare metal servers you probably want to consider a private cloud using something like OpenStack’s Ironic project.

What it boils down to is if Agile IT is strategic to your business, and you have cloud use cases that are both deep and narrow, there are plenty of reasons why you need to consider private cloud.

The post Why the world needs private clouds appeared first on Mirantis | The Pure Play OpenStack Company.

by Amar Kapadia at May 21, 2016 02:24 AM

May 20, 2016

Boost your OpenStack IQ with these tutorials

Every month, scours the Internet for the best in community-created OpenStack tips and tutorials.

by Jason Baker at May 20, 2016 07:01 AM

May 19, 2016

OpenStack Nova Developer Rollup

Nova Weekly Team Meeting (R21)

Meeting log here:

  • June 2: newton-1, non-priority spec approval freeze
  • Keystone Fernet token as default was reverted in Devstack to resolve build issues
  • We have 59 approved blueprints: – 6 are completed, 5 have not started, 3 are blocked
  • BDM v1 and miscellaneous API deprecation to be grouped into the extensions cleanup spec


  • Need help with cells v2 testing in the gate
  • Currently investigating if using transport_url (if configured) could make upgrading easier
    • Also discussion around deprecating the old driver specific messaging options to encourage deployers to set transport_url


Live Migration

  • Test coverage for experimental in the gate has access to the latest qemu and libvirt via Ubuntu 16.04
  • Also increased CI coverage for storage backends
  • Many of changes to storage pools have merged




by auggy at May 19, 2016 11:05 PM

Adam Young

Reproducing an Open vSwitch Bridge Configuration

In the previous post, I described the setup for installing FreeIPA on a VM parallel to the undercloud VM setup by Tripleo Quickstart. The network on the undercloud VM has been setup up by Ironic and Neutron to listen on a network defined for the overcloud. I want to reproduce this on a second machine that is not enrolled in the undercloud. How can I reproduce the steps?


This is far more complex than necessary. All I needed to do was:

sudo ip addr add dev eth1
sudo ip link set eth1 up

To get connectivity, and persist that info in /etc/sysconfig/network-scripts/ifcfg-eth1

But the OVS “cloning” here is still interesting enough to warrant its own post.


Using Tripleo Quickstart, I see that the interface I need is created with:

sudo bash -c 'cat < /etc/sysconfig/network-scripts/ifcfg-vlan10

sudo ifup ifcfg-vlan10

But My VM does not have an OVS_BRIDGE br-ctlplane defined. How do I create that?

Using the ovs commands, I can look at the bridge definition:

$ sudo ovs-vsctl show
    Bridge br-ctlplane
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "eth1"
            Interface "eth1"
    Bridge br-int
        fail_mode: secure
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
        Port br-int
            Interface br-int
                type: internal
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
    ovs_version: "2.5.0"

And that does not exist on the new VM. I’ve been able to deduce that the creation of this bridge happened as a side effect of running

openstack undercloud install

Since I don’t want an undercloud on my other node, I need to reproduce the OVS commands to build the bridge.

I’m in luck. These commands are all captured in /etc/openvswitch/conf.db I can pull them out with:

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null >

That gets me:

 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"a9460ec6-db71-42fb-aec7-a5356bcda153\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"
 ovs-vsctl -t 10 -- --may-exist add-br br-ctlplane -- set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 ovs-vsctl -t 10 -- --if-exists del-port br-ctlplane eth1 -- add-port br-ctlplane eth1
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-int -- set Bridge br-int datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set-fail-mode br-int secure
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-int protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-br br-ctlplane -- set Bridge br-ctlplane datapath_type=system
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Bridge br-ctlplane protocols=OpenFlow10
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-int int-br-ctlplane -- set Interface int-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- --may-exist add-port br-ctlplane phy-br-ctlplane -- set Interface phy-br-ctlplane type=patch options:peer=nonexistent-peer
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface int-br-ctlplane options:peer=phy-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Interface phy-br-ctlplane options:peer=int-br-ctlplane
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- add-port br-int tapacff1724-9f -- set Interface tapacff1724-9f type=internal external_ids:iface-id=acff1724-9fb2-4771-a7db-8bd93e7f3833 external_ids:iface-status=active external_ids:attached-mac=fa:16:3e:f6:6d:86
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f other_config:tag=1 other_config:physical_network=ctlplane other_config:net_uuid=6dd40444-6cc9-4cfa-bfbd-15b614f6e9e1 other_config:network_type=flat
 /bin/ovs-vsctl --timeout=10 --oneline --format=json -- set Port tapacff1724-9f tag=1
 ovs-vsctl -t 10 -- --may-exist add-port br-ctlplane vlan10 tag=10 -- set Interface vlan10 type=internal

Now I don’t want to blindly re-execute this, as there are some embedded values particular to the first machine. The MAC 00:59:cf:9c:84:3a for eth1 is reused by the bridge. The first two lines look like system specific setup. Let’s see if the new VM has anything along these lines.

Things to note:

  1. /etc/openvswitch/ is empty
  2. systemctl status openvswitch.service show the service is not running

Let’s try starting it:
sudo systemctl start openvswitch.service

grep '^\{'  /etc/openvswitch/conf.db | jq '. | ._comment ' | sed -e 's!^\"!!g' -e's!ovs-vsctl:!!' -e 's!\"$!!'   | grep -v null 
 ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
 ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.5.0 \"external-ids:system-id=\\\"8f68fbfb-9278-4772-87f1-500bc80bb917\\\"\" \"system-type=\\\"CentOS\\\"\" \"system-version=\\\"7.2.1511-Core\\\"\"

So we can drop those two lines.

Extract the MAC for interface eth1:

ip addr show eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff

OK, that is about all we can do. Execute it.

sudo ./

No complaints. What did we get?

$ sudo ovs-vsctl show
    Bridge br-int
        fail_mode: secure
        Port "tapacff1724-9f"
            tag: 1
            Interface "tapacff1724-9f"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port int-br-ctlplane
            Interface int-br-ctlplane
                type: patch
                options: {peer=phy-br-ctlplane}
    Bridge br-ctlplane
        Port phy-br-ctlplane
            Interface phy-br-ctlplane
                type: patch
                options: {peer=int-br-ctlplane}
        Port "vlan10"
            tag: 10
            Interface "vlan10"
                type: internal
        Port br-ctlplane
            Interface br-ctlplane
                type: internal
        Port "eth1"
            Interface "eth1"
    ovs_version: "2.5.0"

Looks right.

One thing I notice that is different is that on undercloud, I the bridge has an IP Address:

7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3a brd ff:ff:ff:ff:ff:ff
    inet brd scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843a/64 scope link 
       valid_lft forever preferred_lft forever

Let’s add one to the bridge on our new machine:

$ cat /etc/sysconfig/network-scripts/ifcfg-br-ctlplane
# This file is autogenerated by os-net-config
OVS_EXTRA="set bridge br-ctlplane other-config:hwaddr=00:59:cf:9c:84:3a -- br-set-external-id br-ctlplane bridge-id br-ctlplane"

Again, minor edits, to use proper MAC and a different IP address. Bring it up with:

sudo ifup br-ctlplane

And we can see it:

$ ip addr show br-ctlplane
7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet brd scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

Last step: we need to bring up the eth1 interface. Again, give it a config file, this time in /etc/sysconfig/network-scripts/ifcfg-eth1


And bring it up with :

sudo ifup eth1

Make sure it is up:

$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
    link/ether 00:59:cf:9c:84:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::259:cfff:fe9c:843e/64 scope link 
       valid_lft forever preferred_lft forever

And usable:

$  ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=1.41 ms
64 bytes from icmp_seq=2 ttl=64 time=0.627 ms

I’d really like to laud the Open vSwitch developers for their approach to the database. Having the commands available in the database is a fantastic tool. That is pattern I would love to see emulated elsewhere.

by Adam Young at May 19, 2016 10:53 PM

Installing FreeIPA on a Tripleo undercloud

I’ve been talking about using FreeIPA to secure OpenStack since the Havana summit in Portland. I’m now working with Tripleo to install OpenStack. To get the IPA server installed along with Tripleo Quickstart requires a VM accessible from the Ansible playbook.

Build the Identity VM

  • Apply the patch to quickstart that builds the VM
  • Run quickstartm at least up to the undercloud stage. The steps below do the complete install.

Since Quickstart makes a git repo under ~/.quickstart, I’ve been using that as my repo. It avoids duplication, and makes my changes visible.

mkdir ~/.quickstart
cd ~/.quickstart
git clone
cd tripleo-quickstart
git review -d 315749
~/.quickstart/tripleo-quickstart/   -t all

If you are not set up for git review, you can pull the patch manually from Gerrit.

Set the hostname FQDN for the identity machine

ssh -F /home/ayoung/.quickstart/ssh.config.ansible identity-root hostnamectl set-hostname --static

Add variables to the inventory file ~/.quickstart/hosts

ipa_forwarder=<this come="come" from="from" on="on" resolve.conf="resolve.conf" values="values" warp="warp">
nameserver=<this come="come" from="from" on="on" resolve.conf="resolve.conf" values="values" warp="warp">

Activate the Venv:

. ~/.quickstart/bin/activate

Use Rippowam branch

cd ~/devel
git clone
cd rippowam
git checkout origin/tripleo

Run ansible

ansible-playbook -i ~/.quickstart/hosts ~/devel/rippowam/ipa.yml

Making this VM available to the overcloud requires some network wizardry. That deserves a post itself.

by Adam Young at May 19, 2016 10:43 PM

OpenStack Blog

FAQ: Evolving the OpenStack Design Summit

Please join us for a community town hall on May 25 at 11:30 UTC or 19:00 UTC (to cover as many timezones as possible) to talk through the plans, answer questions and provide your input.

As a result of community discussion, the OpenStack Foundation is evolving the format of the events it produces for the community starting in 2017. The proposal is to split the current Design Summit, which is held every six months as part of the main OpenStack Summit, into two parts: a “Forum” at the main Summit for cross-community discussions and user input (we call this the “what” discussions), and a separate “Project Teams Gathering” event for project team members to meet and get things done (the “how” discussions and sprinting). The intention is to alleviate the need for a separate mid-cycle, so development teams would continue to meet four times per year, twice with the community at large and twice in a smaller, more focused environment. The release cycle would also shift to create more space between the release and Summit. The change triggered a lot of fears and questions — the intent of this FAQ is to try to address them.


Q: How is the change helping upstream developers?


A: During the Summit week, upstream developers have a lot of different goals. We leverage the Summit to communicate new things (give presentations), learn new things (attend presentations), get feedback from users and operators over our last release, gather pain points and priorities for upcoming development, propose changes and see what the community thinks of them, recruit and on-board new team members, have essential cross-project discussions, meet with our existing project team members, kickstart the work on the new cycle, and get things done. There is just not enough time in 4 or 5 days to do all of that, so we usually drop half of those goals. Most will skip attending presentations. Some will abandon the idea of presenting. Some will drop cross-project discussions, resulting in them not having the critical mass of representation to actually serve their purpose. Some will drop out of their project team meeting to run somewhere else. The time conflicts make us jump between sessions, resulting in us being generally unavailable for listening to feedback, pain points, or newcomers. By the end of the week we are so tired we can’t get anything done. We need to free up time during the week. There are goals that can only be reached in the Summit setting, where all of our community is represented — we should keep those goals in the Summit week. There are goals that are better reached in a distraction-free setting — we should organize a separate event for them.

Q: What is the “Forum” ?


A: “Forum” is the codename for the part of the Design Summit (Ops+Devs) that would still happen at the main Summit event. It will primarily be focused on strategic discussions and planning for the next release (the “what”), essentially the start of the next release cycle even though development will not begin for another 3 months. We should still take advantage of having all of our community (Devs, Ops, End users…) represented to hold cross-community discussions there. That means getting feedback from users and operators over specific projects in our last release, gathering pain points and priorities for upcoming development, proposing changes and see what the community thinks of them, and recruiting and on-boarding new team members. We’d like to do that in a neutral space (rather than have separate “Ops” and “Dev” days) so that the discussion is not influenced by who owns the session. This event would happen at least two months after the previous release, to give users time to test and bring valuable feedback.

Q: What is the “Project Teams Gathering” ?


A: “Project Teams Gathering” is the codename for the part of the Design Summit that will now happen as a separate event. It will primarily provide space for project teams to make implementation decisions and start development work (the “how”). This is where we’d have essential cross-project discussions, meet with our existing project team members, generate shared understanding, kickstart the development work on the new cycle, and generally get things done. OpenStack project teams would be given separate rooms to meet for one or more days, in a loose format (no 40-min slots). If you self-identify as a member of a specific OpenStack project team, you should definitely join. If you are not part of a specific project team (or can’t pick one team), you could still come but your experience of the event would likely not be optimal, since the goal of the attendees at this event is to get things done, not listen to feedback or engage with newcomers. This event would happen around the previous release time, when developers are ready to fully switch development work to the new cycle.

Q: How is the change helping OpenStack as a whole?


A: Putting the larger Summit event further away from last release should dramatically improve the feedback loop. Currently, calling for feedback at the Summit is not working: users haven’t had time to use the last release at all, so most of the feedback we collect is based on the 7-month old previous release. It is also the wrong timing to push for new features: we are already well into the new cycle and it’s too late to add new priorities to the mix. The new position of the “Forum” event with respect to the development cycle should make it late enough to get feedback from the previous release and early enough to influence what gets done on the next cycle. By freeing up developers time during the Summit week, we also expect to improve the Summit experience for all attendees: developers will be more available to engage and listen. The technical content at the conference will also benefit from having more upstream developers available to give talks and participate in panels. Finally, placing the Summit further away from the release should help vendors prepare and announce products based on the latest release, making the Summit marketplace more attractive and relevant.


Q: When will the change happen ?


A: Summits are booked through 2017 already, so we can’t really move them anytime soon. Instead, we propose to stagger the release cycle. There are actually 7 months between Barcelona and Boston, so we have an opportunity there to stagger the cycle with limited impact. The idea would be to do a 5-month release cycle (between October and February), place our first project teams gathering end-of-February, then go back to 6-month cycles (March-August) and have the Boston Summit (and Forum) in the middle of it (May). So the change would kick in after Barcelona, in 2017. That gives us time to research venues and refine the new event format.

Q: What about mid-cycles ?


A: Mid-cycle sprints were organized separately by project teams as a way to gather team members and get things done. They grew in popularity as the distractions at the main Summit increased and it became hard for project teams to get together, build social bonds and generally be productive at the Design Summit. We hope that teams will get back that lost productivity and social bonding at the Project Teams Gathering, eliminating the need for separate team-specific sprints. 

Q: This Project Teams Gathering thing is likely to be a huge event too. How am I expected to be productive there? Or to be able to build social bonds with my small team?


A: Project Teams Gatherings are much smaller events compared to Summits (think 400-500 people rather than 7500). Project teams are placed in separate rooms, much like a co-located midcycle sprint. The only moment where everyone would meet would be around lunch. There would be no evening parties: project teams would be encouraged to organize separate team dinners and build strong social bonds.

Q: Does that new format actually help with cross-project work?

A: Cross-project work was unfortunately one of the things a lot of attendees dropped as they struggled with all the things they had to do during the Summit week. Cross-project workshops ended up being less and less productive, especially in getting to decisions or work produced. Mid-cycle sprints ended up being where the work can be done, but them being organized separately meant it is very costly for a member of a cross-project team (infrastructure, docs, QA, release management…) to attend them all. We basically set up our events in a way that made cross-project work prohibitively expensive, and then wondered why we had so much trouble recruiting people to do it. The new format ensures that we have a place to actually do cross-project work, without anything running against it, at the Project Teams Gathering. It dramatically reduces the number of places a Documentation person (for example) needs to travel to get some work done in-person with project team members. It gives project team members in vertical teams an option to break out of their silo and join such a cross-project team. It allows us to dedicate separate rooms to specific cross-project initiatives, beyond existing horizontal teams, to get specific cross-project work done.

Q: Are devs still needed at the main Summit?


A: Upstream developers are still very much needed at the main Summit. The Summit is (and always was) where the feedback loop happens. All project teams need to be represented there, to engage in planning, collect the feedback on their project, participate in cross-community discussions, reach out to new people and on-board new developers. We also very much want to have developers give presentations at the conference portion of the Summit (we actually expect that more of them will have free time to present at the conference, and that the technical content at the Summit will therefore improve). So yes, developers are still very much needed at the main Summit.

Q: My project team falls apart if the whole team doesn’t meet in person every 3 months. We used to do that at the Design Summit and at our separate mid-cycle project team meeting. I fear we’ll lose our ability to all get together every 3 months.


A: As mentioned earlier, we hope the Project Teams Gathering to be a lot more productive than the current Design Summit, reducing the need for mid-cycle sprints. That said, if you really still need to organize a separate mid-cycle sprint, you should definitely feel free to do so. We plan to provide space at the main Summit event so that you can hold mid-cycle sprints there and take advantage of the critical mass of people already around. If you decide to host a mid-cycle sprint, you should communicate that your team mid-cycle will be co-located with the Summit and that team member attendance is strongly encouraged.

Q: We are a small team. We don’t do mid-cycles currently. It feels like that with your change, we’ll have to travel to two events per cycle instead of one.


A: You need to decide if you feel the need to get the team all together to get some work done. If you do, you should participate (as a team) to the Project Teams Gathering. If you don’t, your team should skip it. The PTL and whoever is interested in cross-project work in your team should still definitely come to the Project Teams Gathering, but you don’t need to get every single team member there as you would not have a team room there. In all cases, your project wants to have some developers present at the Summit to engage with the rest of the community.

Q: The project I’m involved with is mostly driven by a single vendor, most of us work from the same office. I’m not sure it makes sense for all of us to travel to a remote location to get some work done !


A: You are right, it doesn’t. We’ll likely not provide specific space at the Project Teams Gathering for single-vendor project teams. The PTL (and whoever else is interested) should probably still come to the Project Teams Gathering to participate in cross-project work. And you should also definitely come to the Summit to engage with other organizations and contributors and increase your affiliation diversity to the point where you can take advantage of the Project Teams Gathering.

Q: I’m a translator, should I come to the Project Teams Gathering?


A: The I18n team is of course free to meet at the Project Teams Gathering. However, given the nature of the team (large number of members, geographically-dispersed, coming from all over our community, ops, devs, users), it probably makes sense to leverage the Summit to get translators together instead. The Summit constantly reaches out to new communities and countries, while the Project Teams Gathering is likely to focus on major developer areas. We’ll likely get better outreach results by holding I18n sessions or workshops at the “Forum” instead.

Q: A lot of people attend the current Design Summit to get a peek at how the sausage is made, which potentially results in getting involved. Doesn’t the new format jeopardize that on-boarding?


A: It is true that the Design Summit was an essential piece in showing how open design worked to the rest of the world. However that was always done at the expense of existing project team members productivity. Half the time in a 40-min session would be spent summarizing the history of the topic to newcomers. Lively discussions would be interrupted by people in the back asking that participants use the mike. We tried to separate fishbowls and workrooms at the Design Summit, to separate discussion/feedback sessions from team-members work sessions. That worked for a time, but people started working around it, making some work rooms look like overcrowded fishbowl rooms. In the end that makes up for a miserable experience for everyone involved and created a lot of community tension. In the new format, the “Forum” sessions will still allow people to witness open design at work, and since those are specifically set up as listening sessions (rather than “get things done” sessions), we’ll take time to engage and listen. We’ll free up time for specific on-boarding and education activities. Fewer conflicts during the week means we won’t be always running to our next sessions and will likely be more available to reach out to others in the hallway track.

Q: What about the Ops midcycle meetup?


A: The Ops meetups are still happening, and for the next year or two probably won’t change much at all. In May, the “Ops Meetups Team” was started to answer the questions about the future of the meetups, and also actively organize the upcoming ones. Part of that team’s definition: “Keeping the spirit of the ops meetup alive” – the meetups are run by ops, for ops and will continue to be. If you have interest, join the team and talk about the number and regional location of the meetups, as well as their content.

Q: What about ATC passes for the Summit?


A: The OpenStack Foundation gave discounted passes to a subset of upstream contributors (not all ATCs) who contributed in the last six months, so that they could more easily attend the Summit. We’ll likely change the model since we would be funding a second event, but will focus on minimizing costs for people who have to travel to both the Summit and the Project Teams Gathering. The initial proposal is to charge a minimal fee for the Project Teams Gathering (to better gauge attendance and help keep sponsorship presence to a minimum), and then anyone who was physically present at the Project Teams Gathering would receive a discount code to attend the next Summit. Something similar is also being looked into for contributors represented by the User Committee (eg. ops). At the same time, we’ll likely beef up the Travel Support Program so that we can get all the needed people at the right events.


If you have additional questions in mind, please join us for the virtual town hall next week and email them to or to make sure we address them during the session. We will also make the recording available for those who cannot attend.

by OpenStack at May 19, 2016 10:41 PM

Ronald Bradford

are you running KVM or QEMU launched instances?

A recent operators mailing list thread asked this question regarding the OpenStack user survey results of April 2016 (See page 39).

As I verified my own local multi-node devstack dedicated H/W environment with varying commands, I initially came across the following error (which later was found to be misleading).

$ virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking for device /dev/kvm                                         : FAIL (Check that the 'kvm-intel' or 'kvm-amd' modules are loaded & the BIOS has enabled virtualization)
  QEMU: Checking for device /dev/vhost-net                                   : WARN (Load the 'vhost_net' module to improve performance of virtio networking)
  QEMU: Checking for device /dev/net/tun                                     : PASS
   LXC: Checking for Linux >= 2.6.26                                         : PASS

This is an attempt to collate a list of varying commands collected from various sources, and the output of these in my Ubuntu 14.04 LTS environment.

# Are you running 64-bit architecture (0=bad; >0 is good)
$ egrep -c ' lm ' /proc/cpuinfo

# Does your processor support hardware virtualization (0=bad; >0 is good)
$ egrep -c '^flags.*(vmx|svm)' /proc/cpuinfo

# Are you running a 64-bit OS
$ uname -m

# Have I installed the right Ubuntu packages
$ dpkg -l | egrep '(libvirt-bin|kvm|ubuntu-vm-builder|bridge-utils)'
ii  bridge-utils                        1.5-6ubuntu2                          amd64        Utilities for configuring the Linux Ethernet bridge
ii  libvirt-bin                         1.2.2-0ubuntu13.1.17                  amd64        programs for the libvirt library
ii  qemu-kvm                            2.0.0+dfsg-2ubuntu1.24                amd64        QEMU Full virtualization

# Have packages configured user privileges
$ grep libvirt /etc/passwd /etc/group
/etc/passwd:libvirt-qemu:x:108:115:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false
/etc/passwd:libvirt-dnsmasq:x:109:116:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/bin/false

# Have I configured QEMU to use KVM
$ cat /etc/modprobe.d/qemu-system-x86.conf
options kvm_intel nested=1

# Have I loaded the KVM kernel modules
$ lsmod | grep kvm
kvm_intel             143630  3 
kvm                   456274  1 kvm_intel

# Are there any KVM related system messages
$ dmesg | grep kvm
[ 2030.719215] kvm: zapping shadow pages for mmio generation wraparound
[ 2032.454780] kvm [6817]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd

# Can I use KVM?
$ kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

# Can I find a KVM device
$ ls -l /dev/kvm
crw-rw---- 1 root kvm 10, 232 May 11 14:15 /dev/kvm

# Have I configured nested KVM 
$ cat /sys/module/kvm_intel/parameters/nested

All of the above is the default output of a stock Ubuntu 14.04 install on my H/W, and with the correctly configured Bios (which requires a hard reboot to verify, and a camera to record the proof).

Some more analysis when changing the Bios.

$ sudo kvm-ok
INFO: /dev/kvm does not exist
HINT:   sudo modprobe kvm_intel
INFO: Your CPU supports KVM extensions
INFO: KVM (vmx) is disabled by your BIOS
HINT: Enter your BIOS setup and enable Virtualization Technology (VT),
      and then hard poweroff/poweron your system
KVM acceleration can NOT be used

When running a VirtualBox VM, the following is found.

$ sudo kvm-ok
INFO: Your CPU does not support KVM extensions
KVM acceleration can NOT be used

Now checking my OpenStack installation for related KVM needs.

# Have I configured Nova to use KVM virtualization
$ grep virt_type /etc/nova/nova.conf
virt_type = kvm

# Checking hypervisor type via API's
$ curl -s -H "X-Auth-Token: ${OS_TOKEN}" ${COMPUTE_API}/os-hypervisors/detail | $FORMAT_JSON | grep hypervisor_type
            "hypervisor_type": "QEMU",
            "hypervisor_type": "QEMU",

# Checking hypervisor type via OpenStack Client
$ openstack hypervisor show -f json 1 | grep hypervisor_type
  "hypervisor_type": "QEMU"

Devstack by default has configured libvirt to use kvm.

Spinning up an instance I ran the following additional checks.

# List running instances
$ virsh -c qemu:///system list
 Id    Name                           State
 2     instance-00000001              running

# Check processlist for KVM usage
$ ps -ef | grep -i qemu | grep accel=kvm
libvirt+ 19093     1 21 16:24 ?        00:00:03 qemu-system-x86_64 -enable-kvm -name instance-00000001 -S -machine pc-i440fx-trusty,accel=kvm,usb=off...

Information from the running VM in my environment.

$ ssh cirros@

$ egrep -c ' lm ' /proc/cpuinfo

$ egrep -c '^flags.*(vmx|svm)' /proc/cpuinfo

$ uname -m

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 6
model name	: QEMU Virtual CPU version 2.0.0

So, while the topic of the ML thread does indeed cover the confusion over OpenStack reporting the hypervisor type as QEMU when infact it does seem so but is enabling KVM via my analysis. I find the original question as a valid problem to operators.

And finally, this exercise while a lesson in understanding a little more about hypervisor and commands available, the original data was simply an operator error where sudo was needed (and not for other commands).

$ sudo  virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking for device /dev/kvm                                         : PASS
  QEMU: Checking for device /dev/vhost-net                                   : PASS
  QEMU: Checking for device /dev/net/tun                                     : PASS
   LXC: Checking for Linux >= 2.6.26                                         : PASS


by ronald at May 19, 2016 08:21 PM

OpenStack Nova Developer Rollup

API-Ref Daily Update

We’re making a lot of great progress, but there’s still more work to do. If you’re interested in getting involved with OpenStack’s Nova team, this is a great place to get started!

<figure class="wp-caption alignnone" data-shortcode="caption" id="attachment_677" style="width: 1644px">api-ref_20150519<figcaption class="wp-caption-text">Api-ref status for May 19, 2016 (click for larger image)</figcaption></figure>

Click here for more information about the API-Ref project.

by auggy at May 19, 2016 07:34 PM


NetApp + Mirantis: Continuing a Great OpenStack Collaboration

The post NetApp + Mirantis: Continuing a Great OpenStack Collaboration appeared first on Mirantis | The Pure Play OpenStack Company.

By Akshai Parthasarathy, Technical Marketing Engineer, NetApp and Christian Huebner, Storage Architect, Mirantis

We are pleased to announce an updated NetApp Mirantis Unlocked Reference Architecture for the Liberty-based Mirantis OpenStack 8.0 with NetApp’s leading enterprise storage platforms, clustered Data ONTAP and E-Series. As part of this effort, we have developed, validated, and certified a Fuel plugin for NetApp. This combination of resources makes it easier for you to architect and deploy OpenStack on a NetApp-based storage system (and vice versa).

NetApp on Mirantis OpenStack 8.0

The modular nature of OpenStack makes it possible to run OpenStack services on your desired hardware using the appropriate drivers. In the case of clustered Data ONTAP, the driver has been available in previous releases, but for MOS 8.0 it has been hardened to provide even more reliability while continuing to provide the same feature-rich enterprise backend capabilities. Using clustered Data ONTAP as a storage backend for OpenStack provides a unified architecture, high storage efficiency, continuous operations, secure multi-tenancy, data protection, seamless scaling, and more.

E-Series for OpenStack enables simple, low-touch administration of flexible and reliable SAN storage, and provides extreme capacity and density, high performance (low latency, high IOPS), and availability. In the latest release of Mirantis OpenStack, E-Series volume group support, thin provisioning, and oversubscription capabilities have been introduced.

We strongly recommend that you use the latest version of Mirantis OpenStack (MOS 8.0, Liberty) and its NetApp Fuel plugin to avail yourself of the latest features. (If you still want to use the Kilo release, the MOS 7.0 Fuel plugin and MOS 7.0 reference architecture are available.)

Reference Architecture

The NetApp Mirantis Unlocked Reference Architecture can be leveraged by technical decision makers, cloud architects, OpenStack community members, and NetApp partners and implementers to help them create highly-available Mirantis OpenStack clusters with NetApp storage. You can use this document for:

  • Best Practices
  • Networking Configurations
  • Provisioning NetApp Storage with Fuel

Supported Protocols for Fuel

The Fuel plugin can deploy the following configurations on the latest MOS release:

  • Clustered Data ONTAP with NFS or iSCSI
  • E-Series/EF-Series with iSCSI

You can also use fibre-channel protocol, but manual configuration of your Cinder backend using the Deployment and Operations Guide from OpenStack@NetApp (Liberty Version) is required.

Manila (File-Share Service)

NetApp and Mirantis continue to contribute upstream to OpenStack Manila. While Mirantis OpenStack 8.0 does not include out-of-the-box support for Manila, Mirantis Services, facilitated by the OpenStack@NetApp team and documentation, can enable NetApp clustered Data ONTAP storage for Manila at customer request.

Where to go from here

If you’re ready to get started, you have several options:

SolidFire is now a NetApp company. For details on SolidFire with MOS, please refer the the SolidFire Partner page from Mirantis.

The post NetApp + Mirantis: Continuing a Great OpenStack Collaboration appeared first on Mirantis | The Pure Play OpenStack Company.

by Guest Post at May 19, 2016 04:44 AM

Lars Kellogg-Stedman

Connecting another vm to your tripleo-quickstart deployment

Let's say that you have set up an environment using tripleo-quickstart and you would like to add another virtual machine to the mix that has both "external" connectivity ("external" in quotes because I am using it in the same way as the quickstart does w/r/t the undercloud) and connectivity to the overcloud nodes. How would you go about setting that up?

For a concrete example, let's presume you have deployed an environment using the default tripleo-quickstart configuration, which looks like this:

  - name: control_0
    flavor: control

  - name: compute_0
    flavor: compute

extra_args: >-
  --neutron-network-type vxlan
  --neutron-tunnel-types vxlan

network_isolation: true

That gets you one controller, one compute node, and enables network isolation. When your deployment is complete, networking from the perspective of the undercloud looks like this:

  • eth0 is connected to the host's brext bridge and gives the undercloud NAT access to the outside world. The interface will have an address on the network.

  • eth1 is connected to the host's brovc bridge, which is the internal network for the overcloud. The interface is attached to the OVS bridge br-ctlplane.

The br-ctlplane bridge has the address

And your overcloud environment probably looks something like this:

[stack@undercloud ~]$ nova list
| ID    ...| Name                    | Status |...| Networks           |
| 32f6ec...| overcloud-controller-0  | ACTIVE |...| ctlplane= |
| d98474...| overcloud-novacompute-0 | ACTIVE |...| ctlplane= |

We want to set up a new machine that has the same connectivity as the undercloud.

Upload an image

Before we can boot a new vm we'll need an image; let's start with the standard CentOS 7 cloud image. First we'll download it:

curl -O

Let's add a root password to the image and disable cloud-init, since we're not booting in a cloud environment:

virt-customize -a CentOS-7-x86_64-GenericCloud.qcow2 \
  --root-password password:changeme \
  --run-command "yum -y remove cloud-init"

Now let's upload it to libvirt:

virsh vol-create-as oooq_pool centos-7-cloud.qcow2 8G \
  --format qcow2 \
  --allocation 0
virsh vol-upload --pool oooq_pool centos-7-cloud.qcow2 \

Boot the vm

I like to boot from a copy-on-write clone of the image, so that I can use the base image multiple times or quickly revert to a pristine state, so let's first create that clone:

virsh vol-create-as oooq_pool myguest.qcow2 10G \
  --allocation 0 --format qcow2 \
  --backing-vol centos-7-cloud.qcow2 \
  --backing-vol-format qcow2

And then boot our vm:

virt-install --disk vol=oooq_pool/myguest.qcow2,bus=virtio \
  --import \
  -r 2048 -n myguest --cpu host \
  --os-variant rhel7 \
  -w bridge=brext,model=virtio \
  -w bridge=brovc,model=virtio \
  --serial pty \

The crucial parts of the above command are the two -w ... arguments, which create interfaces attached to the named bridges.

We can now connect to the console and log in as root:

$ virsh console myguest
localhost login: root

We'll see that the system already has an ip address on the "external" network:

[root@localhost ~]# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:7f:5c:5a brd ff:ff:ff:ff:ff:ff
    inet brd scope global dynamic eth0
       valid_lft 3517sec preferred_lft 3517sec
    inet6 fe80::5054:ff:fe7f:5c5a/64 scope link 
       valid_lft forever preferred_lft forever

And we have external connectivity:

[root@localhost ~]# ping -c1
PING ( 56(84) bytes of data.
64 bytes from ( icmp_seq=1 ttl=56 time=20.6 ms

--- ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 20.684/20.684/20.684/0.000 ms

Let's give eth1 an address on the ctlplane network:

[root@localhost ~]# ip addr add dev eth1
[root@localhost ~]# ip link set eth1 up

Now we can access the undercloud:

[root@localhost ~]# ping -c1
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.464 ms

--- ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.464/0.464/0.464/0.000 ms

As well as all of the overcloud hosts using their addresses on the same network:

[root@localhost ~]# ping -c1
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=64 time=0.464 ms

--- ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.464/0.464/0.464/0.000 ms

Allocating an address using DHCP

In the above instructions we've manually assigned an ip address on the ctlplane network. This works fine for testing, but it could ultimately prove problematic if neutron were to allocate the same address to another overcloud host. We can use neutron to configure a static dhcp lease for our new host.

First, we need the MAC address of our guest:

virthost$ virsh dumpxml myguest |
  xmllint --xpath '//interface[source/@bridge="brovc"]' -
<interface type="bridge">
  <mac address="52:54:00:42:d6:c2"/>
  <source bridge="brovc"/>
  <target dev="tap9"/>
  <model type="virtio"/>
  <alias name="net1"/>
  <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0"/>

And then on the undercloud we run neutron port-create to create a port and associate it with our MAC address:

[stack@undercloud]$ neutron port-create --mac-address 52:54:00:42:d6:c2 ctlplane

Now if we run dhclient on our guest, it will acquire a lease from the neutron-managed DHCP server:

[root@localhost]# dhclient -d eth1
Internet Systems Consortium DHCP Client 4.2.5
Copyright 2004-2013 Internet Systems Consortium.
All rights reserved.
For info, please visit

Listening on LPF/eth1/52:54:00:42:d6:c2
Sending on   LPF/eth1/52:54:00:42:d6:c2
Sending on   Socket/fallback
DHCPREQUEST on eth1 to port 67 (xid=0xc90c0ba)
DHCPACK from (xid=0xc90c0ba)
bound to -- renewal in 42069 seconds.

We can make this persistent by creating /etc/sysconfig/network-scripts/ifcfg-eth1:

[root@localhost]# cd /etc/sysconfig/network-scripts
[root@localhost]# sed s/eth0/eth1/g ifcfg-eth0 > ifcfg-eth1
[root@localhost]# ifup eth1
Determining IP information for eth1... done.

by Lars Kellogg-Stedman at May 19, 2016 04:00 AM

May 18, 2016

Alessandro Pilotti

Open vSwitch 2.5 on Hyper-V (OpenStack) – Part 1

We are happy to announce the availability of Open vSwitch 2.5 (OVS) for Microsoft Hyper-V Server 2012, 2012 R2 and 2016 (technical preview) thanks to the joint effort of Cloudbase Solutions, VMware and the rest of the Open vSwitch community.

The OVS 2.5 release includes the Open vSwitch CLI tools and services (e.g. ovsdb-server, ovs-vswitchd, ovs-vsctl, ovs-ofctl, etc.), and an updated version of the OVS Hyper-V virtual switch forwarding extension, providing fully interoperable GRE, VXLAN and STT encapsulation between Hyper-V and Linux, including KVM based virtual machines.

As usual, we also released an MSI installer that takes care of the Windows services for ovsdb-server and ovs-vswitchd daemons along with all the required binaries and configurations.

All the Open vSwitch code is available as open source here:

Supported Windows operating systems:

  • Windows Server and Hyper-V Server 2012 and 2012 R2.
  • Windows Server and Hyper-V Server 2016 (technical preview).
  • Windows 8, 8.1 and 10.


Installing Open vSwitch on Hyper-V

The entire installation process is seamless. Download our installer and run it. You will be welcomed by the following screen:



Click “Next”, accept the license, click “Next” again and you’ll have the option to install both the Hyper-V virtual switch extension driver and the command line tools. If you want to install only the command line tools (in order to be able to connect to a Linux or Windows server), just deselect the driver option.


Open vSwitch 2.5 Hyper-V Setup on Windows


Click “Next” followed by “Install” and the installation will start. You will have to confirm that you want to install the signed kernel driver and the process will be completed in a matter of a few seconds, generating an Open vSwitch database and starting the ovsdb-server and ovs-vswitchd services.











The installer also adds the command line tools folder to the system path, available after the next logon or CLI shell execution.


Unattended installation

Fully unattended installation is also available (if you already have accepted/imported our certificate). This helps to install Open vSwitch with Windows GPOs, Puppet, Chef, SaltStack, DSC or any other automated deployment solution:

msiexec /i openvswitch-hyperv-2.5.0.msi /l*v log.txt


Configuring Open vSwitch on Windows

Let us assume that we have the following environment: a host with four Ethernet cards in which we shall bind a Hyper-V Virtual Switch on top of one of them.

The list of adapters:

PS C:\package> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
port3                     Intel(R) 82574L Gigabit Network Co...#3      26 Up           00-0C-29-40-8B-EA         1 Gbps
nat                       Intel(R) 82574L Gigabit Network Co...#4      27 Up           00-0C-29-40-8B-E0         1 Gbps
port2                     Intel(R) 82574L Gigabit Network Co...#2      18 Up           00-0C-29-40-8B-D6         1 Gbps
port1                     Intel(R) 82574L Gigabit Network Conn...      17 Up           00-0C-29-40-8B-CC         1 Gbps

Create a Hyper-V external virtual switch. Remember that if you want to take advantage of GRE, VXLAN or STT tunneling you will have to create an external virtual switch with the AllowManagementOS flag set to false.

For example:

PS C:\package> New-VMSwitch -Name vSwitch -NetAdapterName port1 -AllowManagementOS $false

Name    SwitchType NetAdapterInterfaceDescription
----    ---------- ------------------------------
vSwitch External   Intel(R) 82574L Gigabit Network Connection

To verify that the extension has been installed on our system:

PS C:\package> Get-VMSwitchExtension -VMSwitchName vSwitch -Name "Open vSwitch Extension"

Id                  : 583CC151-73EC-4A6A-8B47-578297AD7623
Name                : Open vSwitch Extension
Vendor              : Open vSwitch
Version             :
ExtensionType       : Forwarding
ParentExtensionId   :
ParentExtensionName :
SwitchId            : 5844f4dd-b3d7-496c-81cb-481a64fa7f58
SwitchName          : vSwitch
Enabled             : False
Running             : False
ComputerName        : HYPERV_NORMAL_1
Key                 :
IsDeleted           : False

We can now enable the OVS extension on the vSwitch virtual switch:

PS C:\package> Enable-VMSwitchExtension -VMSwitchName vSwitch -Name "Open vSwitch Extension"

Id                  : 583CC151-73EC-4A6A-8B47-578297AD7623
Name                : Open vSwitch Extension
Vendor              : Open vSwitch
Version             :
ExtensionType       : Forwarding
ParentExtensionId   :
ParentExtensionName :
SwitchId            : 5844f4dd-b3d7-496c-81cb-481a64fa7f58
SwitchName          : vSwitch
Enabled             : True
Running             : True
ComputerName        : HYPERV_NORMAL_1
Key                 :
IsDeleted           : False

Please note that when you enable the extension, the virtual switch will stop forwarding traffic until it is configured (adding the Ethernet adapter under a bridge).


PS C:\package> ovs-vsctl.exe add-br br-port1
PS C:\package> ovs-vsctl.exe add-port br-port1 port1

Let us talk in more detail about the two commands issued above.

The first command:

PS C:\package> ovs-vsctl.exe add-br br-port1

will add a new adapter on the host, which is disabled by default:

PS C:\package> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
br-port1                  Hyper-V Virtual Ethernet Adapter #2          47 Disabled     00-15-5D-00-62-79        10 Gbps
port3                     Intel(R) 82574L Gigabit Network Co...#3      26 Up           00-0C-29-40-8B-EA         1 Gbps
nat                       Intel(R) 82574L Gigabit Network Co...#4      27 Up           00-0C-29-40-8B-E0         1 Gbps
port2                     Intel(R) 82574L Gigabit Network Co...#2      18 Up           00-0C-29-40-8B-D6         1 Gbps
port1                     Intel(R) 82574L Gigabit Network Conn...      17 Up           00-0C-29-40-8B-CC         1 Gbps

This adapter can be used as an IP-able device:

PS C:\package> Enable-NetAdapter br-port1
PS C:\package> New-NetIPAddress -IPAddress -InterfaceAlias br-port1 -PrefixLength 24

IPAddress         :
InterfaceIndex    : 47
InterfaceAlias    : br-port1
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 24
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Tentative
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : ActiveStore

IPAddress         :
InterfaceIndex    : 47
InterfaceAlias    : br-port1
AddressFamily     : IPv4
Type              : Unicast
PrefixLength      : 24
PrefixOrigin      : Manual
SuffixOrigin      : Manual
AddressState      : Invalid
ValidLifetime     : Infinite ([TimeSpan]::MaxValue)
PreferredLifetime : Infinite ([TimeSpan]::MaxValue)
SkipAsSource      : False
PolicyStore       : PersistentStore

The second command:

PS C:\package> ovs-vsctl.exe add-port br-port1 port1

will allow the bridge to use the actual physical NIC on which the Hyper-V vSwitch was created (port1).

Users from Linux are familiar with the setup above because it is similar to a linux bridge.


  • We currently support a single Hyper-V virtual switch in our forwarding extension.
  • Multiple host nics with LACP support is experimental in this release.


OpenStack Integration with Open vSwitch on Windows

OpenStack is a very common use case for Open vSwitch on Hyper-V. The following example is based on a DevStack Mitaka All-in-One deployment on Ubuntu 14.04 LTS with a Hyper-V compute node, but the concepts and the following steps apply to any OpenStack deployment.

Let us install our DevStack node. Here is a sample local.conf configuration:

ubuntu@ubuntu:~/devstack$ cat local.conf 
# Set this to your management IP

#Services to be started
disable_service n-net

enable_service rabbit mysql
enable_service key
enable_service n-api n-crt n-obj n-cond n-sch n-cauth n-cpu
enable_service neutron q-svc q-agt q-dhcp q-l3 q-meta q-fwaas q-lbaas 
enable_service horizon
enable_service g-api g-reg

disable_service heat h-api h-api-cfn h-api-cw h-eng
disable_service cinder c-api c-vol c-sch
disable_service tempest








min_pool_size = 5
max_pool_size = 50
max_overflow = 50


ubuntu@ubuntu:~/devstack$ ifconfig eth3
eth3      Link encap:Ethernet  HWaddr 00:0c:29:25:db:8c  
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::20c:29ff:fe25:db8c/64 Scope:Link
          RX packets:2209 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1007 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:336185 (336.1 KB)  TX bytes:153402 (153.4 KB)

After DevStack finishes installing we can add some Hyper-V VHD or VHDX images to Glance, for example our Windows Server 2012 R2 evaluation image. Additionally, since we are using VXLAN, the default guest MTU should be set to 1450. This can be done via DHCP option if the guest supports it, as described here.

Now let us move to the Hyper-V node. First we have to download the latest OpenStack compute installer:

PS C:\package> Start-BitsTransfer

Full steps on how to install and configure OpenStack on Hyper-V are available here: OpenStack on Windows installation.

In our example, the Hyper-V node will use the following adapter to connect to the OpenStack environment:

Ethernet adapter br-port1:

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::9c1a:f185:bb09:62e2%47
   IPv4 Address. . . . . . . . . . . :
   Subnet Mask . . . . . . . . . . . :
   Default Gateway . . . . . . . . . :

This is the internal adapter bound to the vSwitch virtual switch, as created during the previous steps (ovs-vsctl add-br br-port1).

We can now verify our deployment by taking a look at the Nova services and Neutron agents status in the OpenStack controller and ensuring that they are up and running:

ubuntu@ubuntu:~/devstack$ nova service-list
| Id | Binary           | Host            | Zone     | Status  | State | Updated_at                 | Disabled Reason |
| 5  | nova-conductor   | ubuntu          | internal | enabled | up    | 2016-04-26T20:09:44.000000 | -               |
| 6  | nova-cert        | ubuntu          | internal | enabled | up    | 2016-04-26T20:09:39.000000 | -               |
| 7  | nova-scheduler   | ubuntu          | internal | enabled | up    | 2016-04-26T20:09:45.000000 | -               |
| 8  | nova-consoleauth | ubuntu          | internal | enabled | up    | 2016-04-26T20:09:46.000000 | -               |
| 9  | nova-compute     | ubuntu          | nova     | enabled | up    | 2016-04-26T20:09:48.000000 | -               |
| 10 | nova-compute     | hyperv_normal_1 | nova     | enabled | up    | 2016-04-26T20:09:39.000000 | -               |
ubuntu@ubuntu:~/devstack$ neutron agent-list
| id                                   | agent_type         | host            | availability_zone | alive | admin_state_up | binary                    |
| 1bb8eccc-ad8c-43c2-a54e-d84c6cd7acd4 | DHCP agent         | ubuntu          | nova              | :-)   | True           | neutron-dhcp-agent        |
| 3d89e79d-3cb4-4a10-ae01-773b86f83fb2 | Loadbalancer agent | ubuntu          |                   | :-)   | True           | neutron-lbaas-agent       |
| 7777a901-0c58-4180-8d01-4ea3296621a4 | Open vSwitch agent | ubuntu          |                   | :-)   | True           | neutron-openvswitch-agent |
| 93d6390a-19d2-4c79-8f76-90736bc47f5f | HyperV agent       | hyperv_normal_1 |                   | :-)   | True           | neutron-hyperv-agent      |
| c3af1d4b-5bba-47b0-b0db-b3c0d49bb41a | Metadata agent     | ubuntu          |                   | :-)   | True           | neutron-metadata-agent    |
| ec9bc28c-a5ee-4733-8b9c-3a1f99c42f08 | L3 agent           | ubuntu          | nova              | :-)   | True           | neutron-l3-agent          |

Next we can disable the Windows Hyper-V agent, which is not needed since we use neutron Open vSwitch agent.

From a command prompt (cmd.exe), issue the following commands:

C:\package>sc config "neutron-hyperv-agent" start=disabled
[SC] ChangeServiceConfig SUCCESS

C:\package>sc stop "neutron-hyperv-agent"

SERVICE_NAME: neutron-hyperv-agent
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 1  STOPPED
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

We need to create a new service called neutron-ovs-agent and put its configuration options in C:\Program Files\Cloudbase Solutions\OpenStack\Nova\etc\neutron_ovs_agent.conf. From a command prompt:

C:\Users\Administrator>sc create neutron-ovs-agent binPath= "\"C:\Program Files\Cloudbase Solutions\OpenStack\Nova\bin\OpenStackServiceNeutron.exe\" neutron-hyperv-agent \"C:\Program Files\Cloudbase Solutions\OpenStack\Nova\Python27\Scripts\neutron-openvswitch-agent.exe\" --config-file \"C:\Program Files\Cloudbase Solutions\OpenStack\Nova\etc\neutron_ovs_agent.conf\"" type= own start= auto error= ignore depend= Winmgmt displayname= "OpenStack Neutron Open vSwitch Agent Service" obj= LocalSystem
[SC] CreateService SUCCESS

C:\Users\Administrator>notepad "c:\Program Files\Cloudbase Solutions\OpenStack\Nova\etc\neutron_ovs_agent.conf"

C:\Users\Administrator>sc start neutron-ovs-agent

SERVICE_NAME: neutron-ovs-agent
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 2  START_PENDING
                                (STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x1
        WAIT_HINT          : 0x0
        PID                : 2740
        FLAGS              :

Note: creating a service manually for the OVS agent won’t be necessary anymore starting with the next Nova Hyper-V MSI installer version.

Here is the content of the neutron_ovs_agent.conf file:

policy_file=C:\Program Files\Cloudbase Solutions\OpenStack\Nova\etc\policy.json
tunnel_types = vxlan
local_ip =
tunnel_bridge = br-tun
integration_bridge = br-int
tenant_network_type = vxlan
enable_tunneling = true

Now if we run ovs-vsctl show, we can see a VXLAN tunnel in place:

PS C:\> ovs-vsctl.exe show
    Bridge br-int
        fail_mode: secure
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tun
        fail_mode: secure
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-0e0e0e01"
            Interface "vxlan-0e0e0e01"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="", out_key=flow, remote_ip=""}
    Bridge "br-port1"
        Port "port1"
            Interface "port1"
        Port "br-port1"
            Interface "br-port1"
                type: internal

After spawning a Nova instance on the Hyper-V node you should see:

PS C:\> get-vm

Name              State   CPUUsage(%) MemoryAssigned(M) Uptime   Status
----              -----   ----------- ----------------- ------   ------
instance-00000003 Running 0           512               00:01:09 Operating normally

PS C:\Users\Administrator> Get-VMConsole instance-00000003
PS C:\> ovs-vsctl.exe show
    Bridge br-int
        fail_mode: secure
        Port "f44f4971-4a75-4ba8-9df7-2e316f799155"
            tag: 1
            Interface "f44f4971-4a75-4ba8-9df7-2e316f799155"
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tun
        fail_mode: secure
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-0e0e0e01"
            Interface "vxlan-0e0e0e01"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="", out_key=flow, remote_ip=""}
    Bridge "br-port1"
        Port "port1"
            Interface "port1"
        Port "br-port1"
            Interface "br-port1"
                type: internal

In this example, “f44f4971-4a75-4ba8-9df7-2e316f799155” is the OVS port name associated to the instance-00000003 VM vnic. You can find out the details by running the following PowerShell cmdlet:

PS C:\Users\Administrator> Get-VMByOVSPort -OVSPortName "f44f4971-4a75-4ba8-9df7-2e316f799155"
ElementName                          : instance-00000003

The VM instance-00000003 got an IP address from the neutron DHCP agent, with fully functional networking between KVM and Hyper-V hosted virtual machines!

This is everything you need to get started with OpenStack, Hyper-V and OVS.

In the next blog post we will show you how to manage Hyper-V on OVS without OpenStack using a VXLAN tunnel.

The post Open vSwitch 2.5 on Hyper-V (OpenStack) – Part 1 appeared first on Cloudbase Solutions.

by Alin Serdean at May 18, 2016 08:44 PM

OpenStack Superuser

Oh, the places you’ll go with OpenStack!

The prolific Dr. Seuss published “Oh, the Places You’ll Go!” in 1990, it was his final and perhaps most beloved book. This handy guide to life is frequently given as a gift to newborns entering the world, or later when entering the “real world” at high school or university graduation.

You may have heard of the book, but what you may not know is that when the good doctor started out, he had to endure 27 rejections before finally finding a publisher willing to back him. That kind of dogged persistence is easier said than done, and no doubt informed the wisdom he passed on in his final work.

"Step with care and great tact and remember that Life’s a Great Balancing Act." -- Dr. Seuss

I can’t help but feel that all of us on the OpenStack journey can benefit from his wisdom. Had we received a copy when OpenStack was born six years ago, I imagine we’d have been more prepared for the many twists and turns along the way.

But now that we’re running thousands of production workloads, it’s official: we’re graduating! Real world, here we come. Sure, being an adult means navigating the bang-ups, hang-ups and dead-ends the book mentions, but I think we can all get through this together. Being popular powered us through the early years, but now that the world is relying on us, it’s time to grow up (just a little) and think about the impact we can make in the world.

Just look at all of the exciting places we’re going:

From the phone in your pocket

The telecom industry is undergoing a massive shift, away from hundreds of proprietary devices in thousands of central offices accumulated over decades, to a much more efficient and flexible software plus commodity hardware approach. While some carriers like AT&T have already begun routing traffic from the 4G networks over OpenStack powered clouds to millions of cellphone users, the major wave of adoption is coming with the move to 5G, including plans from AT&T, Telefonica, SK Telekom, and Verizon.

We are on the cusp of a revolution that will completely re-imagine what it means to provide services in the trillion dollar telecom industry, with billions of connected devices riding on OpenStack-powered infrastructure in just a few years.

To the living room socket

The titans of TV like Comcast, DirecTV, and Time Warner Cable all rely on OpenStack to bring the latest entertainment to our homes efficiently, and innovators like DigitalFilm Tree are producing that content faster than ever thanks to cloud-based production workflows.

Your car, too, will get smart

Speaking of going places, back here on earth many of the world’s top automakers, such as BMW and the Volkswagen group, which includes Audi, Lamborghini, and even Bentley, are designing the future of transportation using OpenStack and big data. The hottest trends to watch in the auto world are electric zero emissions cars and self-driving cars. Like the “smart city” mentioned above, a proliferation of sensors plus connectivity call for distributed systems to bring it all together, creating a huge opportunity for OpenStack.

And your bank will take part

Money moves faster than ever, with digital payments from startups and established players alike competing for consumer attention. Against this backdrop of enormous market change, banks must meet an increasingly rigid set of regulatory rules, not to mention growing security threats. To empower their developers to innovate while staying diligent on regs and security, financial leaders like PayPal, FICO, TD Bank, American Express, and Visa are adopting OpenStack.

Your city must keep the pace

Powering the world’s cities is a complex task and here OpenStack is again driving automation, this time in the energy sector. State Grid Corporation, the world’s largest electric utility, serves over 120 million customers in China while relying on OpenStack in production.

Looking to the future, cities will be transformed by the proliferation of fast networks combined with cheap sensors. Unlocking the power of this mix are distributed systems, including OpenStack, to process, store, and move data. Case in point: tcpcloud in Prague is helping introduce “smart city” technology by utilizing inexpensive Raspberry Pis embedded in street poles, backed by a distributed system based on Kubernetes and OpenStack. These systems give city planners insight into traffic flows of both pedestrians and cars, and even measure weather quality. By routing not just packets but people, cities are literally load balancing their way to lower congestion and pollution.

From inner to outer space

The greatest medical breakthroughs of the next decade will come from analyzing massive data sets, thanks to the proliferation of distributed systems that put supercomputer power into the hands of every scientist. And OpenStack has a huge role to play empowering researchers all over the globe: from Melbourne to Madrid, Chicago to Chennai, or Berkeley to Beijing, everywhere you look you’ll find OpenStack.

To explore this world, I recently visited the Texas Advanced Computing Center (TACC) at the University of Texas at Austin where I toured a facility that houses one of the top 10 supercomputers in the world, code named “Stampede

But what really got me excited about the future was the sight of two large OpenStack clusters: one called Chameleon, and the newest addition, Jetstream, which put the power of more than 1,000 nodes and more than 15,000 cores into the hands of scientists at 350 universities. In fact, the Chameleon cloud was recently used in a class at the University of Arizona by students looking to discover exoplanets. Perhaps the next Neil deGrasse Tyson is out there using OpenStack to find a planet to explore for NASA’s Jet Propulsion Laboratories.

Where should we go next?

Mark Collier is OpenStack co-founder, and currently the OpenStack Foundation COO. This article was first published in Superuser Magazine, distributed at the Austin Summit.

Cover illustration: Joe Basnight

by Mark Collier at May 18, 2016 07:02 PM

OpenStack Nova Developer Rollup

API-Ref Daily Update


<figure class="wp-caption alignnone" data-shortcode="caption" id="attachment_618" style="width: 1644px">api-ref_20150518<figcaption class="wp-caption-text">API-Ref status for May 18, 2016 (click for larger image)</figcaption></figure>


Click here for more information about the API-Ref project.

by auggy at May 18, 2016 05:37 PM

OpenStack @ NetApp

NetApp+Mirantis - Continuing a Great OpenStack Collaboration

NetApp and Mirantis are pleased to announce an updated NetApp Mirantis Unlocked Reference Architecture for the Liberty-based Mirantis OpenStack 8.0 with NetApp's leading enterprise storage platforms, clustered Data ONTAP and E-Series. As part of this effort, we have developed, validated, and certified a Fuel plugin for NetApp. This combination of resources makes it easier for you to architect and deploy OpenStack on a NetApp-based storage system (and vice versa).

NetApp on Mirantis OpenStack 8.0

The modular nature of OpenStack makes it possible to run OpenStack services on your desired hardware using the appropriate drivers. In the case of clustered Data ONTAP, the driver has been available in previous releases, but for MOS 8.0 it has been hardened to provide even more reliability while continuing to provide the same feature-rich enterprise backend capabilities. Using clustered Data ONTAP as a storage backend for OpenStack provides a unified architecture, high storage efficiency, continuous operations, secure multi-tenancy, data protection, seamless scaling, and more.

E-Series for OpenStack enables simple, low-touch administration of flexible and reliable SAN storage, and provides extreme capacity and density, high performance (low latency, high IOPS), and availability. In the latest release of Mirantis OpenStack, E-Series volume group support, thin provisioning, and oversubscription capabilities have been introduced.

We strongly recommend that you use the latest version of Mirantis OpenStack (MOS 8.0, Liberty) and its NetApp Fuel plugin to avail yourself of the latest features. If you still want to use the Kilo release, the MOS 7.0 Fuel plugin and MOS 7.0 reference architecture are available.

Reference Architecture

The NetApp Mirantis Unlocked Reference Architecture can be leveraged by technical decision makers, cloud architects, OpenStack community members, and NetApp partners and implementers to help them create highly-available Mirantis OpenStack clusters with NetApp storage. You can use this document for:

  • Best Practices
  • Networking Configurations
  • Provisioning NetApp Storage with Fuel

Supported Protocols for Fuel

The Fuel plugin can deploy the following configurations on the latest MOS release:

  • Clustered Data ONTAP with NFS or iSCSI
  • E-Series/EF-Series with iSCSI

You can also use fibre-channel protocol, but manual configuration of your Cinder backend using the Deployment and Operations Guide (Liberty Version) is required.

Manila (File-Share Service)

NetApp and Mirantis continue to contribute upstream to OpenStack Manila. While Mirantis OpenStack 8.0 does not include out-of-the-box support for Manila, Mirantis Services, facilitated by the OpenStack@NetApp team and documentation, can enable NetApp clustered Data ONTAP storage for Manila at customer request.


For details on SolidFire with MOS, please refer the the SolidFire Partner page from Mirantis.

Where to go from here

If you're ready to get started, you have several options:

May 18, 2016 04:42 PM

May 17, 2016

OpenStack Nova Developer Rollup

Newton Design Summit Recap, Part 1: Scheduler and Cells

The OpenStack Design Summit was held April 25-29, 2016, in Austin, TX. OpenStack contributors got together to discuss key issues and to plan for the Newton release. This is the first in a series providing a summary of key items that were discussed by the Nova team during that time.

Note that these recaps will be Nova-centric, focusing primarily on sessions and discussions pertaining to Nova interests. Also note that these are not comprehensive recaps, and links to additional resources will be provided later in the articles for reference.

The Nova PTL, Matt Riedemann, wrote up summaries for key Nova sessions. For each item, I’ll provide the link to the openstack-dev mailing list archive of his summary and some TL;DR bullet points.

Nova Newton Priorities Tracking

The Nova team maintains an etherpad with an updated list of items to review. This helps the team maintain focus on key issues during the cycle.

All Open Specifications

Here is a link to all open specifications in Nova that need reviewing. As you can see, there are a lot, so cross referencing with the Priorities etherpad linked above is useful to narrow things down.



The long term road map is to move the Scheduler out of Nova and to modularize it to allow it to use external placement decision libraries. This is a long process with incremental changes happening each cycle, keeping backwards compatibility in mind and minimizing the impact to end users as these changes happen.

Key Points

PCI and NUMA database differences need to be addressed in order to move forward

  • NUMA is stored differently than PCI in the database
    • NUMA topology including compute node information (capacity, allocation, usage, etc) is stored in json blob in the same field in compute_nodes.NUMA_topology
    • PCI devices stores information in a different table
    • PCI request instance is stored in instance_extra
  • Goal is to split those things into an inventories table and allocations table
  • Caller should not be aware of any backend changes, resource tracker needs to still show the same values

The allocations/inventories table will go into the API database

  • Deployers and operators were already unhappy with a new API db, so new db would make them even more unhappy
  • Ultimately Scheduler will be moved into its own thing, but this is an interim solution

Capabilities are still undefined

  • Proposal: a capability is a single value representing a specific feature
  • Proposal: Create a set of enum classes that very distinctly describe what a particular capability is
    • We need to consolidate the different values from different resources
      • eg, libvirt returns a feature flag, vmware returns something else, we need to provide a unified value to provide to the caller
  • This is what is currently proposed:


Completed in Mitaka

  • Resource Providers Database Schema
  • Online data migration for inventory (capacity, reserve amounts, cpu, disk, etc)

Slated for Newton

  • Migrating allocation fields for instances
    • blueprint:  resource-providers-allocations
  • Define what a capability is (and isn’t) and define a standard representation
    • blueprint: standardize-capabilities
  • Generic Resource Pool Object Modeling + REST API
    • blueprint:  generic-resource-pools
  • Cleanup around PCI device handling and migration of PCI fields
  • Migration of NUMA fields

Additional Links

Cells v2


One of the biggest challenges Nova faces is its own upper limit. The concept of “cells”, or the idea of many independent compute “containers” running simultaneously, was born as a solution this problem. Unfortunately the Cells v1 effort was not successful at addressing this issue but some important lessons were learned. Cells v2 is the current attempt to solve the compute scalability issue and to generally improve the Nova code base as a handy side-effect.

Andrew Laski gave a fantastic talk providing an overview of Cells v1 and the plan for Cells v2 in Newton. This talk provides some great context and high level architecture. You can view the talk on here.

One key difference between Cells v1 and Cells v2 is that Cells v1 implemented the cells architecture as an alternate path (with all the complexity maintaining an alternate path brings) whereas Cells v2 will be *the only* path. The default would be a single cell deployment with one Compute instance living in one cell.

Key concepts:

  • The API cell is the cell responsible for running the Nova API to handle requests to instances (even ones located in other cells)
    • Has its own API Database
    • The Scheduler lives in this cell and manages instance “scheduling” from here
  • Cell 0 is a special cell that lives outside of the regular cell hierarchy.
    • It is the default location in the event of instance “scheduling” failures
    • Cell 0 is also the default cell in a single-cell deployment (ie, Devstack)
    • Cell 0 can be combined with the API cell for simplicity
  • In Cells v2, managing instance requests requires additional data in order to route messages to the appropriate cell.
    • Instead of just looking up the hostname of the compute node where an instance lives (what we currently do), we will also need the database and queue connection information.
  • Database is split between “local” and “global” data.
    • Local – stuff that only the compute nodes within the cell need to know about
    • Global – stuff the Nova API (living in the API cell) needs to know about to handle instance requests

Completed in Mitaka

  • Created the API database
  • Database connection switching – tell Nova which database connection to use
  • Tools to help with upgrades
  • Scheduling Interaction focused items
    • Implement BuildRequest object + storage in database
      • Persist instance data when a boot request is received by the API but the instance hasn’t been created and written to the database yet
    • RequestSpec Object to persist instance details needed for allocation
      • Used internally by the BuildRequest object and contains the specific “scheduling” details
    • Make cell id nullable
      • A null cell_id defines a state where a boot request was received, but the instance still needs to be allocated to a cell. Because there is no cell_id, information needs to come from the BuildRequest object rather than from the database.

Slated for Newton

  • Data Migration to the API database
    • Flavors
    • Aggregates
    • Key Pairs
    • Quotas
  • Implementation of Cell 0
  • Creation of additional upgrade tools
  • Message Queue connection switching
    • specify which message queue a message should go to
  • Start work on multiple cells support

Additional Links

Editor’s Note: Please feel free to contact me or post comments with any corrections!

by auggy at May 17, 2016 11:21 PM


Use case examples of new Heat Kilo commands

To deep dive in more details, here is an article giving more information on some new Heat Kilo commands.

Use case example of new « hooks » command

Very handy to configure your instances, the « hooks » command allow you to test thru “pause” or “wait” command” and to orchestrate your heat template as you wish, some resources that can wait until a configuration is performed, which allows to avoid errors linked to orchestration.

Here is a way to use it:

1- Create a dedicated file for environmental parameters (here env.yml), in which will be define the resources and the willing hooks (pre-create, pre-update, pre-delete, post-create, post-update and post-delete)

Stack env.yml

    hooks: pre-create
    hooks: pre-update
    hooks: [pre-create, pre-update]

2 – Create a heat template which will call the resources you’ve just created (here : test-hooks.yml)

Stack hooks.yml

heat_template_version: 2013-05-23

  Heat WordPress template, demonstrating Provider Resource.

    type: string
    description : Name of a KeyPair to enable SSH access to the instance

    type: My::WP::Server
        key_name: {get_param: user_key}

     type: My::WP::Server1
        key_name: {get_param: user_key}

     type: My::WP::Server2
        key_name: {get_param: user_key}

3 – Run the following command: heat stack-create STACKNAME -e env.yml -f test-hooks.yml -Pparametre_stack

Use case exemple of the « digest » command:

For exemple, I ask a user to give its password in the string type field, however I do not want that this password be forwarded on all my instances. To avoir such behavior, the command New_password : {digest : [‘sha512’, {get param : old_password}] } will create a new SHA512 encrypted variable for the password which will be far more secure.

Use case exemple of the « Repeat » command:

If I want to ask a user in a heat template which ports will be used in his security group, he could answer 80,443 22 etc. With these datas, you can dynamically create a security group with the list collected thanks to the repeat command.

Direct URL access to the spice / vnc console in the output stack parameters

The following stack :

      get_attr: [First, console_urls, spice-html5]

Provides the following output :


by Julien DEPLAIX at May 17, 2016 10:00 PM

Innovation Beta: MyCloudManager

This first version of MyCloudManager (Beta) is a different stack of everything the team was able to share with you so far. It aims to bring you a set of tools to unify, harmonize and monitor your tenant. In fact it contains a lot of different applications that aims to help you manage day by day your instances :

  • Monitoring and Supervision
  • Log management
  • Jobs Scheduler
  • Mirror ClamAV - Antivirus
  • Repository app manager
  • Time synchronization

MyCloudManager has been completely developed by the CAT team ( Cloudwatt Automation Team).

  • it is based on a CoreOS instance
  • all applications are deployed via Docker containers on a Kubernetes infrastructure
  • The user interface is built in technology React
  • Also you can install or configure, from the GUI, all the applications on your instances via Ansible playbooks.
  • To secure maximum your Cloudmanager, no port is exposed on the internet apart from port 22 to the management of the stack of bodies and port 1723 for PPTP VPN access.


The prerequisites

Initialize the environment

Have your Cloudwatt credentials in hand and click HERE. If you are not logged in yet, you will go thru the authentication screen then the script download will start. Thanks to it, you will be able to initiate the shell accesses towards the Cloudwatt APIs.

Source the downloaded file in your shell. Your password will be requested.

$ source COMPUTE-[...]
Please enter your OpenStack Password:

Once this done, the Openstack command line tools can interact with your Cloudwatt user account.

Install MyCloudManager

The 1-click

MyCloudManager start with the 1-click of Cloudwatt via the web page Apps page on the Cloudwatt website. Choose MyCloudManager apps, press DEPLOY.

After entering your login / password to your account, launch the wizard appears:


As you may have noticed the 1-Click wizard asked to reenter your password Openstack (this will be fixed in a future version of MyCloudManager) You will find [her](( your tenant ID, it’s same as Projet ID. It will be necessary to complete the wizard.

By default, the wizard deploys two instances of type “standard-4” ( A variety of other instance types exist to suit your various needs, allowing you to pay only for the services you need. Instances are charged by the minute and capped at their monthly price (you can find more details on the Pricing page on the Cloudwatt website).

You must indicate the type (standard ou performant) and the size of the block volume that will be attached to your stack via the volume_size parameter.

Finally , you can set a number of nodes to distribute the load. By default, MyCloudManager will be deployed on 1 instance master and 1 slave node. At maximum, MyCloudManager Beta can be deployed on 1 instance master and 3 slave node.


The 1-click handles the launch of the necessary calls on Cloudwatt API :

  • Start an instance based on CoreOS,
  • Create and attach a block volume standard or performed as you want,
  • Start the toolbox container,
  • Start the SkyDNS container

The stack is created automatically. You can see its progression by clicking on its name which will take you to the Horizon console. When all modules become “green”, the creation is finished.

Wait 5 minutes that the entire stack is available.


 Finish OpenVPN access

In order to have access to all functionalities, we have set up a VPN connection.

Here are the steps to follow :

  • First retrieve the output information of your stack,


Windows 7

  • Must now create a VPN connection from your computer , go to “Control Panel > All Control Panel > Network and share center”. Click “ Set up a connection …..”

start vpn internet

  • Entry now the retrieved information in the output stack. Initially the FloatingIP and then login and password provided.

info login

After following this procedure you can now start the VPN connection.


Windows 10

  • Go to Settings> NETWORK AND INTERNET > Virtual Private Network


  • Now enter the information retrieved out of the stack : firstly the FloatingIP and then login and password provided.

configipwin10 configuserwin10

After following this procedure you can now start the VPN connection.


You can now access in the MyCloudManager administration interface via the URL http://manager.default.svc.mycloudmanager and begin to reap the benefit.

It’s (already) done !


Access to the interface and the various applications is via DNS names. Indeed a SkyDNS container is launched at startup allowing you to benefit all the names in place. You can access on the different web interfaces applications by clicking Go or via URL request (ex: http://zabbix.default.svc.mycloudmanager/).

We have attached a block volume to your stack in order to save all data MyCloudManager. The volume is mounted on the master instance and all nodes in your MyCloudManager in the /dev/vdb. This allows our stack to be much more robust. The data being synchronized on all nodes, it allows applications to have access to their data regardless of the node where the are created.

Interface Overview

Here is the home of the MyCloudManager, each thumbnail representing an application ready to be launched. In order to be as scalable and flexible as possible, all applications of MyCloudManager are containers Docker.


A menu is present in the top left of the page, it can move through the different sections of MyCloudManager, we’ll detail them later.

  • Apps: Application List
  • Instances: list of visible instances of MyCloudManager
  • Tasks : all ongoing or completed tasks
  • Audit: list of actions performed
  • My Instances> Console: access to the console Horizon
  • My account> Cockpit ; access to the dashboard


All of the applications in the Apps section are configurable through by Settings button settings on each thumbnail.

As you can see, we have separated them into different sections. params

In the Info section you will find a presentation of the application with some useful links on the application. appresume

In the Environments section you can register here all the parameters to be used to configure the variables of the container to its launch environment. paramsenv

In the Parameters section you can register here all the different application configuration settings. paramapp

To identify the applications running on those that are not, we have set up a color code : An application is started will be surrounded by a green halo and a yellow halo during installation. appstart

The tasks make the tracking of actions performed on MyCloudManager. It is reported in relative time.


It is possible for you to cancel pending on error spot in the tasks menu by clicking horloge which will then show you what logo poubelle.

We also implemented a audit section so you can see all actions performed on each of your instances and export to Excel (.xlsx ) if you want to make a post-processing or keep this information for safety reasons via the button xlsx.


Finally , we integrated two navigation paths in the MyCloudManager menu : My Instances and My Account. They are respectively used to access the Cloudwatt Horizon console and to manage your account via the Cockpit interface.

Add instances to MyCloudManager

To add instances to MyCloudManager, 3 steps:

  1. Attach your router instance of MyCloudManager
  2. Run the script attachment
  3. Start the desired services

1. Attach the instance at the instance of router:

$ neutron router-interface-add $MyCloudManager_ROUTER_ID $Instance_subnet_ID

You will find all the information by inspecting the stack of resources via the command next heat :

$ heat resource-list $stack_name

Once this is done you are now in the ability to add your instance to MyCloudManager to instrumentalize .

2. Start the attachment script:

On MyCloudManager, go to the instance menu and click the button bouton at the bottom right.

We offer two commands to choose: one Curl and one Wget and a command to run a script to create the instance.


Once the script is applied to the selected instance it should appear in the menu instance of your MyCloudManager .


Trick If you want to create an instance via the console horizon Cloudwatt and declare directly in your MyCloudManager, you should to select - in step 3 of the instance launch wizard - MyCloudManager network and the Security Group and - in step 4 - you can paste the command under the setence “If you want to register the instance automatically during the creation process, put this in the startup script within the horizon console :” command in the Custom Script field.



3. Start the required services on the instance :

To help you maximum we created playbooks Ansible to automatically install and configure the agents for different applications.

To do this, simply click on the application you want to install on your machine. The playbook Ansible concerned will be automatically installed. Once the application is installed, the application logo switch to color, allowing you to identify the applications installed on your instances.


The MyCloudManager services provided by applications

In this section, we will present different service MyCloudManager.

Monitoring and supervision

We have chosen to use Zabbix, the most popular application for monitoring, supervision and alerting . Zabbix application is free software to monitor the status of various network services , servers and other network devices but also applications and software worn on the supervised servers; and producing dynamic graphics resource consumption. Zabbix uses MySQL, PostgreSQL or Oracle to store data. According to the large number of machines and data to monitor the choice of SGBD greatly affects performance. Its web interface is written in PHP and provided a real-time view on the collected metrics.

To go further, here are some helpful links :


Log Management

We chose Graylog which is the product of the moment for log management , here is a short presentation : Graylog is an open source log management platform capable of manipulating and presenting data from virtually any source. This container is the offer officially by Graylog teams. * The Graylog Web Interface is a powerful tool that allows anyone to manipulate the entirety of what Graylog has to offer through an intuitive and appealing web application. * At the heart of Graylog is it’s own strong software. Graylog Server interacts with all other components using REST APIs so that each component of the system can be scaled without comprimising the integrity of the system as a whole. * Real-time search results when you want them and how you want them: Graylog is only able to provide this thanks to the tried and tested power of Elasticsearch. The Elasticsearch nodes behind the scenes give Graylog the speed that makes it a real pleasure to use.

Enjoying this impressive architecture and a large library of plugins, Graylog stands as a strong and versatile solution for log management both instances but also worn applications and software on monitored instances.

To go further, here are some helpful links :


Job Scheduler

We have chosen to use Rundeck. The Rundeck application will allow you to schedule and organize all jobs that you want to deploy consistently on all of your holding via its web interface.

In next version of MyCloudManager, we will give you the possibility to backup your servers like as we saw in the bundle Duplicity.

To go further, here are some helpful links :


Mirror Antivirus

This application is a Ngnix server. A CRON script will run every day to pick up the latest virus definition distributed by ClamAV. The recovered packet will be exposed to your instances via Ngnix allowing you to have customers ClamAV update without your instances not necessarily have access to the internet.

To go further, here are some helpful links :


Software repository

We have chosen to use Artifactory. Artifactory is an application that can display any type of directory server via a Ngnix . Here our aim is to offer an application that can expose a repository for all of your instances.

To go further, here are some helpful links :


Time Synchronisation

We have chosen to use NTP. NTP container is used here so that all of your instances without access to the internet can be synchronized to the same time and access to a server time.

To go further, here are some helpful links :


The MyCloudManager versions v1 (Beta)

  • CoreOS Stable 899.13.0
  • Docker 1.10.3
  • Zabbix 3.0
  • Rundeck 2.6.2
  • Graylog 1.3.4
  • Artifactory 4.7.5
  • Nginx 1.9.12
  • Aptly 0.9.6
  • SkyDNS 2.5.3a
  • Etcd 2.0.3

List of distributions supported by MyCloudManager

  • Ubuntu 14.04
  • Debian Jessie
  • Debian Wheezy
  • CentOS 7.2
  • CentOS 7.0
  • CentOS 6.7

Application configuration (by default)

As explained before, we left the possibility via the button Settings settings on each thumbnail, enter all application settings to launch the container. However if you did not, don’t worry, it’s nothing, you can always change the login and password inside the application.

Login and password by default of MyCloudManager applications :

  • Zabbix - Login : admin - Password : zabbix
  • Graylog - Login : admin - Password : admin
  • Rundeck - Login : admin - Password: admin

Other applications have no web interface, so no login/ password, except Artifactory which has no authentication.


Although it’s architecture is based on Docker containers and orchestrator Kubernetes it may MyCloudManager having trouble to instrumentalize instances. Some tracks :

  • Make sure your VPN connection is active
  • Otherwise restart your VPN
  • Refresh the page MyCloudManager by refreshing your browser ( F5 )
  • If your toolbox is active , you are connected to the VPN, but you do not get access to the http://manager.default.svc.mycloudmanager try with If this URL works is that the DNS has not been changed on your computer, you must then either disable your various Antivirus or firewall that could possibly block this connection. The DNS are located in
  • If you are unable to launch applications when you click GO to DNS problems, try the address, then re-click GO. Your applications will launch with the fixed IP address (eg for Zaabix).
  • Feel free to do a flushdns via the command ` ipconfig / flushdns` it will effectively flush the DNS cache.
  • If your new instance does not appear in MyCloudManager, check you if you have includes the security group of your stack MyCloudManager in your instance. Be carrefull of networks aspects: your instance has to communicate with your MyCloudManager to be instrumentalised.
  • We have tested MyCloudManager with Chrome. Some ergonomic differences can appear with other web browsers.

So watt?

The goal of this tutorial is to accelerate your start. At this point you are the master of the stack.

You now have an SSH access point on your virtual machine through the floating-IP and your private keypair (default user name core).

You can access the MyCloudManager administration interface via the URL MyCloudManager

And after?

This article will acquaint you with this first version of MyCloudManager. It is available to all Cloudwatt users in Beta mode and therefore currently free.

The intention of the CAT ( Cloudwatt Automation Team) is to provide improvements on a bimonthly basis. In our roadmap, we expect among others:

  • Instrumentalisation of Ubuntu 16.04 instance (possible today but only with the CURL command),
  • A French version,
  • Several operation effectiveness enhancements,
  • Add the backup function,
  • HA Version,
  • An additional menu to contact Cloudwatt supporting teams or command a cloud coaching prestation,
  • Support of second region
  • many other things

Suggestions for improvement ? Services that you would like ? do not hesitate to contact us

Have fun. Hack in peace.


by The CAT at May 17, 2016 10:00 PM


RDO blogs over the last few weeks

I've been traveling a lot over the last few weeks, and have fallen behind on the blog post updates. Here's what RDO enthusiasts have been blogging about since OpenStack Summit.

I posted a number of "What Did You Do In Mitaka" interview posts, so here those are, all together:

Additionally, there were the following:

Deploying the new OpenStack EC2 API project by Tim Bell

OpenStack has supported a subset of the EC2 API since the start of the project. This was originally built in to Nova directly. At CERN, we use this for a number of use cases where the experiments are running across both the on-premise and AWS clouds and would like a consistent API. A typical example of this is the HTCondor batch system which can instantiate new workers according to demand in the queue on the target cloud.

…

Running Keystone Unit Tests against older Versions of RDO Etc by Adam Young

Just because upstrem is no longer supporting Essix doesn’t mean that someone out there is not running it. So, if you need to back port a patch, you might find yourself in the position of having to run unit tests against an older version of Keystone (or other) that does not run cleanly against the files installed by tox.

…

Containers and the CERN cloud by Ricardo Rocha

In recent years, different groups at CERN started looking at using containers for different purposes, covering infrastructure services but also end user applications. These efforts have been mostly done independently, resulting in a lot of repeated work especially for the parts which are CERN specific: integration with the identity service, networking and storage systems. In many cases, the projects could not complete before reaching a usable state, as some of these tasks require significant expertise and time to be done right. Alternatively, they found different solutions to the same problem which led to further complexity for the supporting infrastructure services. However, the use cases were real, and a lot of knowledge had been built on the available tools and their capabilities.

…

Meet Red Hat OpenStack Platform 8 by Sean Cohen

Last week we marked the general availability of our Red Hat OpenStack Platform 8 release, the latest version of Red Hat’s highly scalable IaaS platform based on the OpenStack community “Liberty” release. A co-engineered solution that integrates the proven foundation of Red Hat Enterprise Linux with Red Hat’s OpenStack technology to form a production-ready cloud platform, Red Hat OpenStack Platform is becoming a gold standard for large production OpenStack deployments. Hundreds of global production deployments and even more proof-of-concepts are underway, in the information, telecommunications, financial sectors, and large enterprises in general. Red Hat OpenStack Platform also benefits from a strong ecosystem of industry leaders for transformative network functions virtualization (NFV), software-defined networking (SDN), and more.

…

OpenStack Summit Austin: Day 1 by Gordon Tillmore

We’re live from Austin, Texas, where the 13th semi-annual OpenStack Summit is officially underway! This event has come a long way from its very first gathering six years ago, where 75 people gathered to learn about OpenStack in its infancy. That’s a sharp contrast with the 7,000+ people in attendance here, in what marks Austin’s second OpenStack Summit, returning to where it all started!

…

OpenStack Summit Austin: Day 2 by Gordon Tillmore

Hello again from Austin, Texas where the second busy day of OpenStack Summit has come to a close. Not surprisingly, there was plenty of news, interesting sessions, great discussions on the showfloor, and more.

…

Culture and technology can drive the future of OpenStack by E.G.Nadhan

“OpenStack in the future is whatever we expand it to”, said Red Hat Chief Technologist, Chris Wright during his keynote at the OpenStack Summit in Austin. After watching several keynotes including those from Gartner and AT&T, I attended other sessions during the course of the day culminating in a session by Lauren E Nelson, Senior Analyst at Forrester Research. Wright’s statement made me wonder about what lies in store for OpenStack and where would the OpenStack Community — the “we” that Wright referred to — take it to in the future.

…

OpenStack Summit Austin: Day 3 by Gordon Tillmore

Hello again from Austin, Texas where the third day of OpenStack Summit has come to a close. As with the first two days of the event, there was plenty of news, interesting sessions, great discussions on the showfloor, and more. All would likely agree that the 13th OpenStack Summit has been a Texas-sized success so far!

…

Resource management at CERN by Tim Bell

As part of the recent OpenStack summit in Austin, the Scientific Working group was established looking into how scientific organisations can best make use of OpenStack clouds.

…

OpenStack Summit Austin: Day 4 by Gordon Tillmore

Hello again from Austin, Texas where the fourth day of the main OpenStack Summit has come to a close. While there are quite a few working sessions and contributor meet-ups on Friday, Thursday marks the last official day of the main summit event. The exhibition hall closed its doors around lunch time, and the last of the vendor sessions occurred later in the afternoon. As the day concluded, many attendees were already discussing travel plans for OpenStack Summit Barcelona in October!

…

OpenStack Summit Newton from a Telemetry point of view by Julien Danjou

It's again that time of the year, where we all fly out to a different country to chat about OpenStack and what we'll do during the next 6 months. This time, it was in Austin, TX and we chatted about the new Newton release that will be out in October.

…

Identity work for the OpenStack Newton release by Adam Young

The Newton Summit is behind us, and we have six months to prepare for the next release in both upstream OpenStack and RDO. Here is my attempt to build a prioritized list of the large tasks I want to tackle in this release.

…

Mitaka Cinder Recap sessions by Gorka Eguileor

During Mitaka we introduced some big changes in Cinder that have a great impact for developers working on new and existing functionality. These new features include, but are not limited to, API microversions, support for Rolling Upgrades, and conditional DB update functionality to remove API races. So we decided to have Recap Sessions during the OpenStack Summit in Austin.

…

Analysis of techniques for ensuring migration completion with KVM by Daniel Berrange

Live migration is a long standing feature in QEMU/KVM (and other competing virtualization platforms), however, by default it does not cope very well with guests whose workload are very memory write intensive. It is very easy to create a guest workload that will ensure a migration will never complete in its default configuration. For example, a guest which continually writes to each byte in a 1 GB region of RAM will never successfully migrate over a 1Gb/sec NIC. Even with a 10Gb/s NIC, a slightly larger guest can dirty memory fast enough to prevent completion without an unacceptably large downtime at switchover. Thus over the years, a number of optional features have been developed for QEMU with the aim to helping migration to complete.

…

What did everyone do for the Mitaka release of OpenStack ? by David Moreau Dimard

Just what did everyone do for the Mitaka OpenStack release ? RDO community liaison Rich Bowen went to find out. He interviewed some developers and engineers that worked on OpenStack and RDO throughout the Mitaka cycle and asked them what they did and what they were up to for the Newton cycle.

…

by Rich Bowen at May 17, 2016 06:12 PM

OpenStack Nova Developer Rollup

API-Ref Daily Update

No change from yesterday, but we’re chugging along!

Sean Dague recently pushed some changes to the Burndown chart to include a table of what still needs to be done. The data for this chart is available in both JSON and text formats. Now I can just import that data into my spreadsheet without having to manually tweak things, making this whole thing go a lot quicker.

<figure class="wp-caption alignnone" data-shortcode="caption" id="attachment_521" style="width: 1644px">api-ref_20150517<figcaption class="wp-caption-text">Api-ref status for May 17, 2016 (click for larger image)</figcaption></figure>

Click here for more information about the API-Ref project.

by auggy at May 17, 2016 05:00 PM

OpenStack Superuser

Hate the taxes, not the online platform: HMRC's journey with OpenStack

It’s the busiest tax day of the year and millions of people are filing online. In just 24 hours, Her Majesty's Revenue and Customs will take in £350 million (about $505 million.)

Instead of sweating it, the team behind the digital platform of HMRC spent part of the day noshing pizza and playing LAN tournaments.

“In live ops, the biggest success stories are always anti-climaxes,” says Tim Britten, product owner for the HMRC digital platform. “The 31st of January 2016 was really boring for us. We didn't have to do anything. We knew that we were resilient across data centers…That has never really happened before. Normally, we would be absolutely bricking it. We have a lot of self-healing containerizations (so) it looks after itself at the moment.”

It's really good to be afraid

In just four months, a team of four engineers reached that milestone. Britten and his co-workers could relax about the performance of HMRC’s multi-channel digital tax platform (MDTP) because dev-ops consultant Philip Harries had doubled down on the fear factor.

“If you're building any kind of infrastructure, any engineering project, big or small, it's really good to be afraid,” says Harries. “The glass is half-empty. You've got to be pessimistic, you've got to plan for failure. You've got to be resilient against any kind of failure in your system.”

Harries’ strategy to protect HMRC against failure--whether it be from infrastructure bugs, human error or zero-day vulnerabilities that might bring down an entire data center--was to go with multiple vendors.

Those vendors include DataCentred, which provides the OpenStack public cloud and VMware from Skyscape Cloud Services, a company founded to provide cloud computing services through the G-Cloud initiative.

HMRC essentially runs a web gateway for people to interface with the government for their tax affairs. Here’s a look at the architecture that bolstered a “boring” tax day. Requests go to an Akamai content delivery network (CDN) before being farmed out to each provider. There’s a public-facing zone, with networks, micro-services and Mongo database clusters. Then comes a layer of proxies between that and a protected zone, which is also full of micro-services and MongoDB clusters. Finally, there’s a private layer on the Skyscape side only.

“There are more secure processes — but it has nothing to do with the customer. We can actually lose that without any interruption to the customer journey,” Harries says. Behind that, HMRC doesn’t actually store any data permanently in the infrastructure. There are a couple of secure data centers--physical data centers--linked up by virtual private networks (VPNs.)

Time for revolution, not evolution

That uneventful 2015 tax day was the happy ending of HMRC’s journey with OpenStack that also resulted in winning the UK IT Awards Digital Project of the year. The trip started back in 2010, when Martha Lane Fox, a peer whose previous experience included co-founding, issued a report on reforming the British government's digital service saying that “government needs to move to a ‘service culture,’ putting the needs of citizens ahead of those of departments.”

At the time, HMRC was anything but agile. Its services were waterfall deliveries, there was a ‘massive amount’ of outsourcing and typically six-month release cycles. “We would have huge and huge amounts of change on one weekend or two weekends of the year. If you did something wrong or you had a bit of content wrong on the page, you wouldn't be able to get that in until six months later,” Britten says. That’s painfully slow — considering that HMRC is responsible for 50 percent of all transactions with the British government.

Following the report pushing for “revolution not evolution,” the Government Digital Service was created with a mandate to revolutionize digital services. Three of the 25 pilot projects were at HMRC.

“We realized that the only way were going to do this was if we built a new department within HMRC that was outside the confines of the current organization,” Britten says. “We didn't use any of the corporate networks. We went out, we bought MacBooks. We brought in people from the outside. We started a small skunk works to deliver these services.”

alt text here

Britten recalls that period in 2013 as “absolutely awesome” with the small team using unlocked laptops, talking directly to users for the first time, delivering services in-house and jazzed about the commitment to code in the open --you can find HMRC's code on GitHub.

“We got a little bit too excited about all the functionality building. We forgot about infrastructure,” he admits. Then when it came time they scrambled for infrastructure supply, because “we couldn’t simply whack out a credit card and get AWS” since the British government and the public are stringent about data handling and privacy.

HMRC was restricted to finding a cloud supplier that could provide what they needed and the availability they needed but also wasn't US-based. There was one vendor who met the requirements. “As we go into the future, we plan to have three different technologies underlying our infrastructure,” Britten says, including Amazon Web Services (AWS) if the service becomes available in the UK. “We spread those bets evenly.”

The team hit on the fact that those three initial projects shared components for what they decided to call the tax platform but adds that “we didn't think of it as a platform-as-a-service (PaaS) or anything. We just had to build stuff.”

"If you fail...scary things start to happen"

The team built a shared infrastructure running the micro-services architecture on Docker containers, got the three initial services live, and then were almost drowned by their own success.

“We have people phoning us up saying, ‘By the way, we're going to set up a delivery center in Newcastle. It's going to have 20 agile teams.’ We went from three, to five, and, suddenly, bam, we went to 20. Then, someone phones up again and says, "By the way, we're setting up another delivery center…and they're going to have another 20 teams.'" At this point, Britten says they were “desperately” trying to scale. “We're forced into a position where we have to build a PaaS,” he says.

Higher-ups decided to move all online HMRC payments to the tax platform and the platform became the only way for people to file self-assessment returns. Previously, these services were hosted by an incumbent supplier. There had been some outages, which at the time weren't that worrying, but Britten says they realized that in January, the tax platform service would take around £150 million in a day during the tax season peak.

Britten and team were becoming the main people in HMRC to deliver or run services for the UK government tax authority — yet the infrastructure ran on the shoulders of one provider and tax season was looming. alt text here

“If you fail, if you get downtime at that point, scary things start to happen,” Britten says. “If you have to delay the tax deadline, the prime minister and the chancellor have to meet and sign that off. The treasury have to start borrowing money to cover the loss that they're have in interest. We're in October and we start to go, ‘This is kind of worrying.’”

The search for another UK-based provider lead to Datacentred and OpenStack. “We were like, all right, we know that the OpenStack API is really versatile. These guys look good. This is our best bet."

With just a few months to go before tax season, they buckled down.

Without the time to push infrastructure changes out by going through dev and seeing if they work, QA, staging and then into production, the team started building the staging environment. That new staging environment was used for functional testing, as well as performance testing. Before it was finished, they were tasked with building out the production version. The team functionally tested it in November in staging, without abandoning the production build.

“On Christmas eve, which is actually an awesome day to have a full outage if you're the tax authority because no one does their tax then, we started a 48-hour outage of our current production, which was running on one supplier,” Britten says.

Then, he says, “we did something I wouldn't recommend to anyone.” Just weeks before tax day, they turned off all of the tax systems, replicated all the data, populated the new Mongo clusters. Then, they switched over and tried to test. “It was awful. Eventually, everything woke up and got it working over about an hour.”

Then they turned off Skyscape for the first time in two-and-a-half years, relying solely on Datacentred. Then they switched back. “As easy as that, you can switch through the suppliers in terms of how much traffic you're putting through them.”

alt text hereImage courtesy HMRC.

The current infrastructure has proved sturdy — and provided a different kind of downtime for the team.

“I always look at Twitter when we do the January peak,” Britten says. “If you look at previous years people are like, ‘I can't log in.’ Now, they're just having a go at the tax authority, which is awesome. That's what you want to see. People not actually not being able to pay their tax, just really annoyed that they had to.”

You can watch the entire 34-minute talk from Britten and Harries from the Austin Summit on the OpenStack Foundation’s YouTube channel.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="" src="" width=""></iframe>

Cover Photo // CC BY NC

by Nicole Martinelli at May 17, 2016 04:14 PM


Which came first, the hardware or the software?

The post Which came first, the hardware or the software? appeared first on Mirantis | The Pure Play OpenStack Company.

Supermicro OpenStack Optimized Hardware Designs

With the emergence of every new software architecture, there typically emerges a corresponding new hardware architecture. There is a chicken and egg element to this emergence: did new hardware designs enable a new software model or vice versa? Does the web happen without open source Linux and Apache? Does open source Linux and Apache happen without general purpose X86 servers?

So the question becomes, does OpenStack and private cloud drive the emergence of new optimized hardware designs to support the value, scale, and efficiency that users are looking to achieve with their OpenStack Private Cloud deployments, or do those optimized designs make it possible for OpenStack and private cloud to emerge in the first place?

At the heart of the question is the concept of workload-optimized hardware, which enables you to squeeze that last bit of performance out of your system by focusing only on what matters at the time.  Supermicro, which saw 36% revenue growth in 2015 saw strong demand from two specific segments that highly value optimized hardware design: Hyper-converged Appliances and Internet Cloud Service providers.

Why appliances?

If you go out and look under the skin of many server and storage appliances on the market, chances are you will find Supermicro Systems inside.  Why?  One reason is Supermicro’s building block architecture that enables an appliance designer to virtually customize the hardware design to meet the specific requirements of the service/workload.  Need more power, leading edge NVMe technologies, smaller form factor, or more I/O?  The  massive portfolio of systems and configurations available from Supermicro enables that optimization of the system design to the software and application.  So the Appliance vendor is able to break the Chicken and Egg cycle of which came first the software or hardware to deliver an optimized solution.

Why cloud service providers?

Another place this kind of model works is at the macro level, with Internet cloud service providers. In this case, the efficiencies achieved by an optimized hardware design get multiplied by the hundreds, thousands, or even tens of thousands for a cloud service provider deploying at scale. So does the fact that these systems can be build provide an opportunity for OpenStack to run these large Internet cloud service providers (and make no mistake, it does) or does the fact that OpenStack helps make it possible for these providers to exist increase the need for specialized hardware?

Looking closely at the situation

It’s fine to discuss all of this in the abstract, of course, but at the end of the day, what if I am not an appliance or a massive cloud data center and am looking to deploy Openstack? In this case, one opportunity for workload optimized hardware design is the Supermicro Mirantis Unlocke­­­­­d Appliance for Cloud Native Applications. Leveraging the building block architecture developed for appliances, cloud service providers and the enterprise, Supermicro worked in close partnership with Mirantis engineering the develop an OpenStack optimized design for the appliance, tuning the architecture from a performance, density, and power efficiency perspective to deliver maximum value for deploying OpenStack private cloud.

So which came first? The software or the hardware?

And at the end of the day, as long as you’re getting the solution you need, does it really matter?

Find out more about the Supermicro Mirantis Unlocke­­­­­d Appliance for Cloud Native Applications and about Supermicro at

Michael McNerney, General Manager Solution Enablement and Marketing, Supermicro Inc.
Michael McNerney has served in a number of strategic roles in development, strategy and marketing.  He currently is the General Manager for Solution Enablement and Marketing at Supermicro.  He is responsible for developing Supermicro’s Solution Business and Hyperconverged product offerings.   In this role, Mr. McNerney leverages 20+ years of experience in enterprise computing, extensive customer relationships and a network of architects, engineers and other thought leaders to deliver market leading products.  He is responsible for bringing new products to market and driving the ongoing business through product launches, pricing, benchmarking, training, web, social media, analyst/public relations etc.

The post Which came first, the hardware or the software? appeared first on Mirantis | The Pure Play OpenStack Company.

by Guest Post at May 17, 2016 02:16 PM


Cooking Some "Mitaka" Flavoured OpenStack on Your Local Machine

About the author

Jan Klare is the co-founder of the cloudbau GmbH and a big automation and bleeding edge technology fan. He is a core reviewer in the official Openstack-Chef project and was the project team lead for the Mitaka cycle. In addition to automating the deployment of OpenStack he loves to play with new and fancy automation and orchestration technologies.

To learn more about cloudbau visit their website:

cloudbau GmbH


Hi everybody and welcome to the OpenStack-Chef kitchen. Today we are going to cook ourselves some nice and tasty OpenStack Mitaka. And since we are not any ordinary boring one-node-only cooking show, we will cook a whole cluster of it to satisfy the cravings of all your developers at once. Lets get started then!

The ingredients

To get started with this, your kitchen should contain at least 16GB of ram, 4 cores and some hard disk to play on. The kitchen I am cooking in today is a simple MacBook Pro with 16GB of ram and a 3,1 GHz i7 from early 2015. If your kitchen lacks some of these equipments, you can still try, but there is no guarantee that you will be able to cook the same cluster we are going for without running out of resources.

We will be using virtualbox and vagrant and before you start you should get and install them according to the guides for your platform.

In addition to the kitchen itself we will need a whole bunch of cookbooks, but luckily you can get all of them at a one-stop-berkself directly from the openstack-chef-repo. Just pull the repo to your favorite git location.

Since we will be using a lot of the standard tools in store the chef business, you should either have all the needed gems already bundled up somewhere, or you should download and install the fitting chefdk with the version 0.9.0. I will recommend to go with the chefdk 0.9.0, since we used it too and it worked. In case you wonder why we are not using the most recent version: We tested with 0.9.0 during the whole development and it still works and somebody said “never change a winning team”.

If you have prepared all the things mentioned above and already some appetite for a nice and tasty OpenStack, you should go ahead and continue with the next steps.

Mise en Place

To get started you should cd to the openstack-chef-repo you pulled before and have a quick look at some of the core documents we will use during the actual cooking.

The Rakefile

The first and most mysterious thing here is the Rakefile, which contains a lot of useful methods and has them already bundled in tasks to deploy and test different scenarios of OpenStack. Of course you can go ahead and read the whole file (which might even be a good idea in case you want to continue working with this chef-repo), but for today we will just use three of these tasks:

  1. berks_vendor: As stated in the description, this task will pull in all of the needed cookbooks for today’s cooking session. It will read the Berksfile, use berkshelf to resolve all the dependencies and download all cookbooks to the cookbooks folder inside the openstack-chef-repo.

  2. multi_node: This task will automate all the cooking for you and the only thing you have to do during this, is take a look at how fast this one will build a local three-node OpenStack cluster. It will read the vagrant_linux.rb and pull in the needed vagrant box. This can be either ubuntu 14.04 or centos 7.2, which is completely up to you and switchable by the environment variable ‘REPO_OS’. In addition to that, it will also use the multi-node.rb which specifies the exact configuration of all virtual machines (controller_config and compute_config) and how these should be created with chef-provisioning. It will create one controller-node with the role ‘multi-node-controller’ and two compute-nodes with the role ‘multi-node-compute’. So after running this, we will have a total of three nodes, which will be controller, compute1 and compute2.

  3. clean: This task is pretty straightforward and well described with “Blow everything away”. You can and should use it in case you get bored after you have seen the whole cluster work or in case you get stuck somewhere and want to start fresh. Since it says “Blow everything away”, you really will need to start at the beginning by vendoring your cookbooks (1).

The environment

As you might have seen already in the multi-node.rb, we will also use a distribution specific environment, to get some additional flavor into our cluster. Since the two environment files are very similar, we will just look at the ubuntu one for now and you should be able to walk yourself through the centos one if you need it.

apache2 and apt

Starting from the top, we will reset the node['apache']['listen'] to [] in order to not use the default anymore, since this would interfere with our configuration. Additionally we will configure apt to run a full update of its sources during the compile time, so we can install the newest stuff right from the beginning (and even during the compile_time).

And now for the more interesting part: OpenStack OpenStack

Since we want to deploy OpenStack with the openstack-chef-cookbooks, we will need to pass some attributes to these, to align them with all our expectations for a three node cluster.


The first thing we want to do here, is to allow all nodes to forward network traffic. This is needed, since we want to run our routers and dhcp namespaces on the controller and connect them via openvswitch (ovs) bridges to the instances running on the two compute nodes.


To actually allow all the OpenStack services to talk to each other, either via the message queue or directly via the APIs, we need to define the endpoints we want to use. Since all of the APIs and the message queue (mq) will be running on our controller node, we will configure one of its IP-addresses (‘’) as the ‘host’ attribute for the endpoints and the mq. With this configuration, all of the OpenStack service APIs will be reachable via their default ports (e.g. 9696 for neutron) on the address ‘’ (e.g. ‘’ for neutron).

binding services

Right below the endpoint setting, we see a whole block that looks quite similar to the endpoint one, but is called ‘bind_service’. In addition to the endpoints where the service will be reachable, we also need to define where the actual service should be listening. You might think that this is the exact same thing, but it’s not. In the most production environments you will need additional proxies like ‘haproxy’ or ‘apache’ right in front of your APIs for security, filtering, threading and HA. This said, you the endpoint where your API is reachable might in fact be an ‘apache’ or ‘haproxy’, listening on a completely different ip and port than your actual OpenStack service is. During the design of the cookbooks we decided to bind all of the services by default to ‘’ so we have some security by default and do not make them world accessible. In our scenario today however, we need them to be accessible by our compute nodes and from the outside of the vagrant boxes (since we maybe want to test the cli tools on our local machine against the APIs), and will therefore bind them to ‘’ to avoid a more complex configuration. This will make them accessible on their default ports via all IP addresses assigned to the controller node. The next important setting in our environment is the detailed and attribute driven configuration of the networking service neutron.

Neutron ml2 plugin

In the first section we configure the ml2 plugin we want to use for our virtual networks. In this case we want to go with the default ml2 plugin using vxlan as the overlay to separate tenant networks.

Neutron ovs tunnel interface

To allow the actual traffic to flow between instances and router/dhcp namespaces, we need to additionally specify the interface we want to create our overlay vxlan ovs bridge on. For our scenario this will be the ‘eth1’ interface on the controller and compute nodes. The actual ovs bridge configuration will be done with the example recipe from the network cookbook.

neutron.conf and ml2_plugin.ini

In the following and last networking section, we are setting some configuration parameters that will directly go into the neutron.conf. In our scenario we want to use the neutron-l3-agent and therefore need to enable the service_plugin ‘router’. We also need to specify where the neutron-openvswitch-agent running on our compute nodes can find the mq to talk to the other Neutron agents. And since we want to use vxlan as the default for our tenant networks, we need to specify this as well.

upload default cirros image

To be able to instantly start some instances after we have the cluster up and running, we are also enabling the image upload of a simple cirros image.


The last section in our environment is dedicated to configuring Nova. Since we will be running our cluster on top of visualization, we do not want to use the default ‘virt_type’ kvm, but rather go with qemu. The last option for the ‘oslo_messaging_rabbit’ section enables nova-compute to talk to the mq and all of the other nova service (same as for the neutron-openvswitch-agent).

I guess we have now spend enough time with our Mise en Place and should start the actual cooking, people are getting hungry.

Cooking like a boss

As all of you chefs might now, if you have done a good Mise en Place, cooking becomes a breeze. The only thing we need to do now to get things started is getting all our cookbooks with the ‘berks_vendor’ task (1) and run the ‘multi_node’ task (2) from the Rakefile mentioned above. If you are using chefdk you can do this by running:

chef exec rake berks_vendor
chef exec rake multi_node


If your first chef run fails while installing the package “cinder-common”, you are probably on a mac and there seems to be strange issue with handing over the locales during a chef run. Just start the run again with:

chef exec rake multi_node

You should now get yourself a coffee and maybe even some fresh air, since this will take a while.

Serving the Mitaka flavored OpenStack cluster

After roundabout 15-20 minutes, depending on your kitchen hardware, you will have a full OpenStack Mitaka ready for consumption. Now lets dig into it!


At the time of writing, there is a rather unpleasent bug in the startup of libvirt-bin, since the default logging service virtlogd seems to be started but instantly crashes. The bug is documented on launchpad and can simply be fixed by starting the service virtlogd manually or restarting the whole compute nodes. To start the service manually ssh to the compute1 and compute2 and run:

sudo service virtlogd start

After that you should be good to go.

Most people like to start with the good looking stuff, so we will go ahead and navigate to the dashboard, which should be accessible on https://localhost:9443. You can log in as the ‘admin’ user with the password ‘mypass’.

You should really enjoy this part a bit longer, maybe create some networks, routers, and even launch some instances directly off the cirros images that was automagically uploaded.

If you enjoyed the dashboard, you maybe want to dig a bit deeper and try to work with the command line clients directly from the controller. To do so, you should go back to your openstack-chef-repo and navigate to the subfolder ‘vms’. Inside of that folder you can use vagrant to directly ssh to the controller or one of the compute nodes like this:

# ssh to controller
vagrant ssh controller
# ssh to the first compute node
vagrant ssh compute1

Once you are on the controller, you should become ‘root’ and load the environment variables from the provided openrc file in /root/openrc like this:

# become root
sudo -i
# load openrc environment variables
. /root/openrc

Since all of the python clients you need to talk to the OpenStack APIs were already installed during the deployment, you can now go ahead and use them to either do the same things you did on the dashboard above (create networks, routers and launch some instances) or try something new and play a little with heat, since chefs usually love hot cooking.

Cleaning up

As soon as you decide that you had enough OpenStack for today, you can exit the controller node, navigate back to the openstack-chef-repo root directory and clean up your whole kitchen with the ‘clean’ task (3) from the Rakefile mentioned at the beginning.

chef exec rake clean

I think thats it for today, if you have any questions regarding this setup, come and find me and the other openstack chef core reviewers at the irc channel on freenode #openstack-chef.

Happy cooking!

May 17, 2016 12:43 PM


EMC introduces Native Hybrid Cloud

The post EMC introduces Native Hybrid Cloud appeared first on Mirantis | The Pure Play OpenStack Company.

Last month at the OpenStack summit in Austin, EMC talked about it’s Build-Extend-Optimize strategy, intended to help companies to get the most out of cloud.  This week, at EMC World, the company made two new announcements, both focused on helping customers more easily bridge the so-called “OpenStack skills gap” holding back adoption.

Previewed last year as Project Caspian, Neutrino nodes will now be available for EMC’s VCE VxRack product. These nodes will have “turnkey OpenStack” pre-installed, and customers will be able to start with as few as four and scale up to many hundreds.  Conceived as a way to distribute a hyperconverged infrastructure with OpenStack on commodity hardware. Neutrino nodes will also be available for vSphere and VMware Photon-based deployments.

The idea is to curate different options to make it easier for IT departments.

The systems also include the Cloud Foundry PaaS, and are intended as an all-in-one shop for internal cloud development — and a challenge to AWS. They’re also intended for cloud-native applications, which typically aren’t as a good a fit for EMC’s Enterprise Hybrid Cloud product.

Also challenging AWS is the new VirtuStream storage cloud, which provides much of the same function as AWS S3.

The new VxRack starts with 4 nodes and will start at $300,000. It will be available later this year.

EMC also launched Polly, open source software designed to help with container storage scheduling. “Short for polymorphic volume scheduling,” Container Journal wrote,  “Polly provides a centralized storage scheduling service that connects to container schedulers. EMC is also pledging to further enhance Polly to create a framework that enables the scalable offer-acceptance pattern of consuming volumes across a broad array of container and storage platforms.”


The post EMC introduces Native Hybrid Cloud appeared first on Mirantis | The Pure Play OpenStack Company.

by Nick Chase at May 17, 2016 10:59 AM

Major Hayden

Troubleshooting OpenStack network connectivity

NOTE: This post is a work in progress. If you find something that I missed, feel free to leave a comment. I’ve made plenty of silly mistakes, but I’m sure I’ll make a few more. :)

Completing a deployment of an OpenStack cloud is an amazing feeling. There is so much automation and power at your fingertips as soon as you’re finished. However, the mood quickly turns sour when you create that first instance and it never responds to pings.

It’s the same feeling I get when I hang Christmas lights every year only to find that a whole section didn’t light up. If you’ve ever seen National Lampoon’s Christmas Vacation, you know what I’m talking about:


I’ve stumbled into plenty of problems (and solutions) along the way and I’ll detail them here in the hopes that it can help someone avoid throwing a keyboard across the room.

Security groups

Security groups get their own section because I forget about them constantly. Security groups are a great feature that lets you limit inbound and outbound access to a particular network port.

However, OpenStack’s default settings are fairly locked down. That’s great from a security perspective, but it can derail your first instance build if you’re not thinking about it.

You have two options to allow traffic:

  • Add more permissive rules to the default security group
  • Create a new security group and add appropriate rules into it

I usually ensure that ICMP traffic is allowed into any port with the default security group applied, and then I create a another security group specific to the class of server I’m building (like webservers). Changing a security group rule or adding a new security group to a port takes effect in a few seconds.

Something is broken in the instance

Try to get console access to the instance through Horizon or via the command line tools. I generally find an issue in one of these areas:

  • The IP address, netmask, or default gateway are incorrect
  • Additional routes should have been applied, but were not applied
  • Cloud-init didn’t run, or it had a problem when it ran
  • The default iptables policy in the instance is overly restrictive
  • The instance isn’t configured to bring up an instance by default
  • Something is preventing the instance from getting a DHCP address

If the network configuration looks incorrect, cloud-init may have had a problem during startup. Look in /var/log/ or in journald for any explanation of why cloud-init failed.

There’s also the chance that the network configuration is correct, but the instance can’t get a DHCP address. Verify that there are no iptables rules in place on the instance that might block DHCP requests and replies.

Some Linux distributions don’t send gratuitous ARP packets when they bring an interface online. Tools like arping can help with these problems.

If you find that you can connect to almost anything from within the instance, but you can’t connect to the instance from the outside, verify your security groups (see the previous section). In my experience, a lopsided ingress/egress filter almost always points to a security group problem.

Something is broken in OpenStack’s networking layer

Within the OpenStack control plane, the nova service talks to neutron to create network ports and manage addresses on those ports. One of the requests or responses may have been lost along the way or you may have stumbled into a bug.

If your instance couldn’t get an IP address via DHCP, make sure the DHCP agent is running on the server that has your neutron agents. Restarting the agent should bring the DHCP server back online if it isn’t running.

You can also hop into the network namespace that the neutron agent uses for your network. Start by running:

# ip netns list

Look for a namespace that starts with qdhcp- and ends with your network’s UUID. You can run commands inside that namespace to verify that networking is functioning:

# ip netns exec qdhcp-NETWORK_UUID ip addr
# ip netns exec qdhcp-NETWORK_UUID ping INSTANCE_IP_ADDRESS

If your agent can ping the instance’s address, but you can’t ping the instance’s address, there could be a problem on the underlying network — either within the virtual networking layer (bridges and virtual switches) or on the hardware layer (between the server and upstream network devices).

Try to use tcpdump to dump traffic on the neutron agent and on the instance’s network port. Do you see any traffic at all? You may find a problem with incorrect VLAN ID’s here or you may see activity that gives you more clue (like one half of an ARP or DHCP exchange).

Something is broken outside of OpenStack

Diagnosing these problems can become a bit challenging since it involves logging into other systems.

If you are using VLAN networks, be sure that the proper VLAN ID is being set for the network. Run openstack network show and look for provider:segmentation_id. If that’s correct, be sure that all of your servers can transmit packets with that VLAN tag applied. I often remember to allow tagged traffic on all of the hypervisors and then I forget to do the same in the control plane.

Be sure that your router has the VLAN configured and has the correct IP address configuration applied. It’s possible that you’ve configured all of the VLAN tags correctly in all places, but then fat-fingered an IP address in OpenStack or on the router.

While you’re in the router, test some pings to your instance. If you can ping from the router to the instance, but not from your desk to the instance, your router might not be configured correctly.

For instances on private networks, ensure that you created a router on the network. This is something I tend to forget. Also, be sure that you have the right routes configured between you and your OpenStack environment so that you can route traffic to your private networks through the router. If this isn’t feasible for you, another option could be OpenStack’s VPN-as-a-service feature.

Another issue could be the cabling between servers and the nearest switch. If a cable is crossed, it could mean that a valid VLAN is being blocked at the switch because it’s coming in on the wrong port.

When it’s something else

There are some situations that aren’t covered here. If you think of any, please leave a comment below.

As with any other troubleshooting, I go back to this quote from Dr. Theodore Woodward about diagnosing illness in the medical field:

When you hear hoofbeats, think of horses not zebras.

Look for the simplest solutions and work from the smallest domain (the instance) to the widest (the wider network). Make small changes and go back to the instance each time to verify that something changed. Once you find the solution, document it! Someone will surely appreciate it later.

The post Troubleshooting OpenStack network connectivity appeared first on

by Major Hayden at May 17, 2016 02:43 AM

Hugh Blemings

OpenStack Summit Austin 2016 Summary of Summaries


In the three editions of Lwood since the Austin Summit I provided a summary of mailing list posts where the writer had provided a summary of particular Design Summit Project sessions.

The list below is the combination of these in one (hopefully!) easily searchable list – saves trawling through each Lwood to find them :)

Project Summaries

Summaries posted to the OpenStack-Dev mailing list

Edits: 18/5/16 to add Assaf Muller’s Neutron summary (thanks to Boden Russell for pointing it out);


During the summit a colleague in the Product Working Group suggested that given the mailing list would be quiet, perhaps I could summarise all the Etherpads.  I looked into this and, well, doing it comprehensively is a task beyond my modest abilities and available time, but a few pointers might be of use;

Note that many of the per project summaries mentioned above that went to the OpenStack-dev mailing list link to the relevant Etherpad(s) too.


by hugh at May 17, 2016 01:29 AM

OpenStack Superuser

Using Kubernetes to deploy OpenStack

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="" src="" width=""></iframe>

Alex Polvi, CEO of CoreOS, talks with Mark Collier of the OpenStack Foundation about using Kubernetes to deploy OpenStack. Polvi refutes the idea that Kubernetes and OpenStack are at odds, shares more about his keynote demo and explains GIFEE (Google infrastructure for everyone else).

by Superuser at May 17, 2016 12:01 AM

May 16, 2016


Cloudwatt updates its Hadoop-as-a-Service / Analytics-as-a-Service product !

Cloudwatt’s Big Data infrastructure management product is updated and online. It is based on the latest Openstack released called Mitaka. Major updates:

  • Enhanced customer experience based on Horizon Mitaka, more intuitive and with step-by-step wizards…
  • Latest hadoop distribution versions (Spark, Cloudera, MapR)
  • Last but not least, Hortonworks distribution is not correctly support on lastest release

Hadoop distribution lineup details:

  • Hortonworks 2.3
  • Spark 1.6.1
  • Cloudera 5.5
  • MapR 5.1
  • Storm 0.9

by Alvin Heib at May 16, 2016 10:00 PM


It’s not really “serverless” computing:’s Ivan Dwyer

The post It’s not really “serverless” computing:’s Ivan Dwyer appeared first on Mirantis | The Pure Play OpenStack Company.

OpenStack:Unlocked Podcast hosts Nick Chase and John Jainschigg interview’s Director of Business Development, Ivan Dwyer at the OpenStack Summit in Austin last month.  They discuss:

  • What “serverless” computing is
  • An explanation of and the different modes it provides
  • How to use containers to (reasonably) painlessly implement a microservices workflow
  • How containers and OpenStack fit together
  • How scales large amounts of container-based jobs
  • Who Ivan thinks we should have on the podcast
  • And the identity of the handsome fellow in the hat

You can find all of the OpenStack:Unlocked podcasts here, or check out the full Ivan Dwyer interview:

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>

The post It’s not really “serverless” computing:’s Ivan Dwyer appeared first on Mirantis | The Pure Play OpenStack Company.

by Nick Chase at May 16, 2016 09:40 PM

OpenStack Nova Developer Rollup

API-Ref Daily Update

Today’s API-Ref update!

<figure class="wp-caption alignnone" data-shortcode="caption" id="attachment_512" style="width: 1644px">api-ref_20150516<figcaption class="wp-caption-text">Api-ref status for May 16, 2016 (click for larger image)</figcaption></figure>

Click here for more information about the API-Ref project.

by auggy at May 16, 2016 07:40 PM

OpenStack Blog

OpenStack Developer Mailing List Digest May 7-13

SuccessBot Says

  • Pabelanger: bare-precise has been replaced by ubuntu-precise. Long live DIB
  • bknudson: The Keystone CLI is finally gone. Long live openstack CLI.
  • Jrichli: swift just merged a large effort that started over a year ago that will facilitate new capabilities – like encryption
  • All

Release Count Down for Week R-20, May 16-20

  • Focus
    • Teams should have published summaries from summit sessions to the openstack-dev mailing list.
    • Spec writing
    • Review priority features
  • General notes
    • Release announcement emails will be tagged with ‘new’ instead of ‘release’.
    • Release cycle model tags now say explicitly that the release team manages releases.
  • Release actions
    • Release liaisons should add their name and contact information to this list [1].
    • New liaisons should understand release instructions [2].
    • Project teams that want to change their release model should do so before the first milestone in R-18.
  • Important dates
    • Newton 1 milestone: R-18 June 2
    • Newton release schedule [3]

Collecting Our Wiki Use Cases

  • At the beginning, the community has been using a wiki [4] as a default community information publication platform.
  • There’s a struggle with:
    • Keeping things up-to-date.
    • Prevent from being vandalized.
    • Old processes.
    • Projects that no longer exist.
  • This outdated information can make it confusing to use, especially newcomers, that search engines will provides references to.
  • Various efforts have happened to push information out of the wiki to proper documentation guides like:
    • Infrastructure guide [5]
    • Project team guide [6]
  • Peer reviewed reference websites:
  • There are a lot of use cases that a wiki is a good solution, and we’ll likely need a lightweight publication platform like the wiki to cover those use cases.
  • If you use the wiki as part of your OpenStack work, make sure it’s captured in this etherpad [9].
  • Full thread

Supporting Go (continued)

  • Continuing from previous Dev Digest [10].
  • Before Go 1.5 (without the -buildmode=shared) it didn’t support the concept of shared libraries. As a consequence, when a library upgrades, the release team has to trigger rebuild for each and every reverse dependency.
  • In Swift’s case for looking at Go, it’s hard to write a network service in Python that shuffles data between the network and a block device and effectively use all the hardware available.
    • Fork()’ing child processes using cooperative concurrency via eventlet has worked well, but managing all async operations across many cores and many drives is really hard. There’s not an efficient interface in Python. We’re talking about efficient tools for the job at hand.
    • Eventlet, asyncio or anything else single threaded will have the same problem of the filesystem syscalls taking a long time and the call thread can be blocked. For example:
      • Call select()/epoll() to wait for something to happen with many file descriptors.
      • For each ready file descriptor, if the file descriptor socket is readable, read it, otherwise EWOULDBLOCK is returned by the kernel, and move on to the next file descriptor.
  • Designate team explains their reasons for Go:
    • MiniDNS is a component that due to the way it works, it’s difficult to make major improvements.
    • The component takes data and sends a zone transfer every time a record set gets updated. That is a full (AXFR) zone transfer where every record in a zone gets sent to each DNS server that end users can hit.
      • There is a DNS standard for incremental change, but it’s complex to implement, and can often end up reverting to a full zone transfer.
    • Ns[1-6] may be tens or hundreds of servers behind anycast Ips and load balancers.
    • Internal or external zones can be quite large. Think 200-300Mb.
    • A zone can have high traffic where a record is added/removed for each boot/destroy.
    • The Designate team is small, and after looking at options, judging the amount of developer hours available, a different language was decided.
  • Looking at Designates implementation, there are some low-hanging fruit improvements that can be made:
    • Stop spawning a thread per request.
    • Stop instantiating Oslo config object per request.
    • Avoid 3 round trips to the database every request. Majority of the request here is not spent in Python. This data should be trivial to cache since Designate knows when to invalidate the cache data.
      • In a real world use case, there could be a cache miss due to the shuffle order of multiple miniDNS servers.
  • The Designate team saw 10x improvement for 2000 record AXFR (without caching). Caching would probably speed up the Go implementation as well.
  • Go historically has poor performance with multiple cores [11].
    • Main advantages with the language could be CSP model.
    • Twisted does this very well, but we as a community consistently support eventlet. Eventlet has threaded programming model, which is poorly suited for Swift’s case.
    • PyPy got a 40% performance improvement over Cpython for a brenchmark of Twisted’s DNS component 6 years ago [12].
  • Right now our stack already has dependency C, Python, Erlang, Java, Shell, etc.
  • End users emphatically do not care about the language API servers were written in. They want stability, performance and features.
  • The Infrastructure related issues with Go for reliable builds, packaging, etc is being figured out [13]
  • Swift has tested running under PyPy with some conclusions:
    • Assuming production-ready stability of PyPy and OpenStack, everyone should use PyPy over CPython.
      • It’s just simply faster.
      • There are some garbage collector related issues to still work out in Swift’s usage.
      • There are a few patches that do a better job of socket handling in Swift that runs better under PyPy.
    • PyPy only helps when you’ve got a CPU-constrained environment.
    • The GoLang targets in Swift are related to effective thread management syscalls, and IO.
    • See a talk from the Austin Conference about this work [14].
  • Full thread


by Mike Perez at May 16, 2016 03:36 PM

Hugh Blemings



Welcome to Last week on OpenStack Dev (“Lwood”) for the week just past. For more background on Lwood, please refer here.

Basic Stats for week 9 May to 15 May 2016:

  • ~800 Messages (up about 21% relative to last week)
  • ~233 Unique threads (up about 13% relative to last week)

Busiest week on the list since I started doing Lwood back at the end of June 2015.

Notable Discussions

Future of Cross Project Meetings

Mike Perez provides an update on the future of Cross Project (IRC) meetings.  Now that some momentum has been built the meetings will move to a little bit more of a self service model.

In particular Mike will no longer be announcing if the meetings are -not- occurring – instead folk interested in fixing a particular cross-project issue or feature should introduce a meeting following the process Mike outlines.

The expectation is that most CP meetings will now bring together a subset of projects rather than all of them.

Minor Tweak to automated release announcement emails

Doug Hellmann points out that a recently made change to the script that generates automated release announcements means that the subject line will now include “[new]” in place of the “[release]” tag.

Do you use ? Please tell us more…

Thierry Carrez notes that there are moves afoot to better fine tune the way the Wiki operates and what it’s used for to make it more useful and less of a magnet for Spam.  To that end he asks people to take a few minutes and describe what/where/how they use the Wiki – write up their Wiki use cases in effect.  Please contribute :)

The Monster Thread…

No summary of OpenStack related goings on would be complete without noting a lengthy thread that kicked off a few weeks back and at the time of writing is still going.

The early part of the thread started here – a note from John Dickinson about the Swift team’s plans to code portions of Swift in the Go language (mentioned in last week’s Lwood).  At Thierry Carrez suggestion the process was commenced of seeking Technical Council approval to add Go to the list of supported languages for OpenStack.

That thread is up 141 messages and counting and has been, ahem, spirited in places.  Most of the discussion has been around the core question – using Go to code up parts of OpenStack as distinct from where a Go based API should be made available for OpenStack.  That latter seems to have firmly converged on No/Not Relevant.

The applicability/appropriateness of using Go (or other additional languages) at the core of OpenStack has been the predominant topic as well as (best I can tell) some pretty useful discussion about why Go is thought necessary (versus Python, coding critical sections in C or some such) etc.

Will be interesting to see how this thread pans out as it enters it’s second week and (possibly) second century of messages counts… :)

Austin OpenStack Summit Wrapup – Part III

While not quite as much traffic as last week, a healthy amount of post-Summit summary discourse on the list last week.

Since the summaries have already spanned three weeks worth of I’m pulling together a consolidated list which will go out later this week – you’ll be able to find it linked from here.

Summaries on List

My thanks to Thierry Carrez for a tweet noting his appreciation for these summaries and for some subsequent retweets – thus encouraged will do ‘em again next time :)

Upcoming OpenStack Events

A few midcycles being organised already


Don’t forget the OpenStack Foundation’s comprehensive Events Page for a comprehensive list that is frequently updated.

People and Projects

PTL/Core nominations & changes

Further Reading & Miscellanea

Don’t forget these excellent sources of OpenStack news – most recent ones linked in each case

This edition of Lwood brought to you by Santana (Sacred Fire), Thin Lizzy (Johnny The Fox), Tommy Emmanuel (The Journey),  Baby Animals (Early Warning, One Word), The Bottom 40 (Covering Happy by Pharrell Williams), Joe Walsh (Live From Daryl’s House: Funk 49-50 and Rocky Mountain Way) amongst other tunes.


by hugh at May 16, 2016 10:53 AM

Cross project collaboration, release naming, and more OpenStack news

Here's what's happening this week in OpenStack, the open source cloud infrastructure project.

by Jason Baker at May 16, 2016 06:59 AM

OpenStack @ NetApp

Automating IaaS for DevOps on FlexPod with Puppet

Hi! I'm Amit Borulkar, an engineer here at NetApp in the Converged Infrastructure group. My interactions and conversations with customers yield a common trend that I've observed over the past year: the walls between development and operations are being torn down by DevOps practices.

Once those customers implement DevOps practices, they achieve the following:

  • increased agility
  • better operational efficiency
  • more frequent release cycles

All aspects of the development cycle can be accelerated: source code management, build lifecycle, quality assurance, and the deployment itself can be intricately stitched together through automation via an automation orchestration framework.

The infrastructure component setup can be automated in the very same manner as the development cycle!

How can infrastructure be automated?

Automation typically involves writing scripts in an imperative programming style, where the author specifies a sequence of actions (using the APIs exposed by infrastructure components) for configuring the infrastructure components to a desired end state. To make infrastructure provisioning robust, these scripts need to identify the initial state of the infrastructure and accordingly invoke an appropriate sequence of logic to transition the infrastructure to a desired end state.

  • A workflow might involve multiple resources. A script to invoke proper logic depending on all the possible combinations of initial state of the resources might involve complex logic.
  • Writing comprehensive logic for multiple workflows might be a daunting task.
  • The logic would also need to handle resource dependency tracking among different infrastructure components. (For example, Disk space for a Virtual Machine has to provisioned before it is created).
  • The results of these scripts needs to be consistent among multiple runs and across different infrastructure regions.

Writing these scripts might be cumbersome, error-prone, and they oftentimes require frequent changes based on the dynamic nature of the infrastructure. This situation might lead to an automation paradox where operations staff spend more time managing these scripts than what could be spent solving other business problems.

Declarative state-based automation solves the problem of having to maintain custom scripts. The end user declares the desired end state without having to specify (or even know) the underlying implementation details. Manifests describe the desired end state of the system (in an easy to grasp XML/JSON like syntax) using resources and their list of attributes. Operations staff declare the desired end state in a manifest file, and all the intricacies are abstracted from the operator. These include items such as: invoking the proper APIs based on current infrastructure state, and resource dependency tracking. Bottom line: let the automation take care of that for you.

These manifests can be version controlled using source management tools such as Git. You can use source control to also provide an accurate audit trail of when and where changes to the infrastructure have been performed. Does this mean that the infrastructure configuration itself is now version controlled? Yes! You can reliably switch between different configurations and maintain consistency across your entire infrastructure, no matter how many different items you may respectively have.

Here is a sample manifest file, that creates a NetApp FlexVol and a LUN on a storage appliance running NetApp clustered Data ONTAP:

node ‘’ {
# Create a Volume
  netapp_volume { 'vol1_iscsi' :
    ensure => present,
    aggregate => "aggr02_node01",
    initsize => "1g",
    state => "online",
    exportpolicy => "exp_vserver01"

# Create a LUN
  netapp_lun { '/vol/vol1_iscsi/vserver01_lun':
    ensure => present,
    ostype => "windows",
    size => '400m',
    spaceresenabled => 'false'

How does FlexPod enable state-based provisioning?

FlexPod is a pre-validated converged infrastructure platform powered by NetApp and Cisco with compute, network and storage components bundled together, which serves as a building block for your datacenter infrastructure. To illustrate the concept of declarative state-based provisioning of infrastructure resources, we will use Puppet’s integration with the storage, network and hypervisor components of FlexPod.

State diagram

Provisioning Storage

NetApp, along with Puppet have jointly developed an open-source module for managing the configuration of NetApp’s clustered Data ONTAP operating system. The module has a comprehensive set of resource types defined for it, which allows the management of aggregates, storage virtual machines (SVM for short), network interfaces, and many other aspects of the cluster. The module can also enable access to storage resources using exports, shares and LUNs. The Data ONTAP device module also allows you to manage your data across both on-premises environments and public cloud seamlessly using NetApp's SnapMirror protocol to interconnect those different cloud environments (on-premesis, public cloud, etc) thus enabling NetApp’s vision of a Data Fabric.

Complete solution architecture diagrams, information about supported resource types, sample manifests for cluster-scoped operations and SVM-scoped operations for using Puppet with NetApp FAS Storage can be found in a recently published NetApp Technical Report - Using Puppet to Manage Your NetApp Storage.

Be sure to also to check out Puppet modules for other NetApp storage offerings like E-Series/EF-Series and SolidFire.

Managing the Network

Cisco Systems was one of the first vendors to leverage the Puppet device functionality to administer the devices which cannot natively run a Puppet agent. The module allows you to configure VLANs and interfaces consistently across different ethernet switches. A detailed list of the features supported by the module across different platforms can be found here.

Deploying Virtual Infrastructure

VMware vSphere ESXi is one of the most commonly deployed hypervisors across datacenter environments for server virtualization. When it comes to provisioning of virtual infrastructure, VMware has released a VMware vCenter module available via the Puppet Forge. The module allows you to create Datacenters within vCenter, add ESXi hosts, add clusters, manage Datastores, and more.

Note: The modules might not be able provide resource types for each and every infrastructure feature, but they do provide baseline functionality sufficient to establish a base infrastructure by supporting the most commonly performed operations. Due to the modular nature of the Puppet Type/Provider interface, new resource types can be easily added without affecting existing resource types.

What does all this mean?

The result of declarative configuration management using Puppet is a set of manifests that describe the intended end-state configuration of each infrastructure component. Take for instance the execution of a Continuous Integration / Continuous Deployment (CI-CD) workflow: manifest templates would be used to provision storage, (such as a LUN or NFS share containing the source code), then it could be mapped to a VM owned by a developer. The manifest templates could also describe replicating build and test servers on demand from a "golden" test/build server image. Another use case applies to the concept of hybrid cloud. Data can be moved from on-premises to one or more Cloud ONTAP instances, or NetApp Private Storage (NPS) for Cloud, using the Puppet module’s “netapp_snapmirror” resource type.

Manifests of small tasks can be grouped together to form a catalog which correspond to an Infrastructure-as-a-Service (IaaS) workflow. The operator is only responsible for changing variable names and applying the catalog. Let’s look at a sample catalog which corresponds to a simple end-to-end provisioning workflow. The catalog consists of two manifest files, one for managing clustered Data ONTAP resources (like creating volumes, enabling the NFS service, creating NFS export policy and rules, and so on):

node ''{
#Creates an export policy  
  netapp_export_policy {'nfs_export':
    ensure => present,  

#Adds rules to export policy  
  netapp_export_rule {'nfs_export:1':
    ensure => present,    
    clientmatch => '',
    protocol => ['nfs'],
    superusersecurity => 'none',
    rorule => ['sys','none'],
    rwrule => ['sys', 'none']

#Enables the NFS server on the Storage Virtual Machine (SVM)
  netapp_nfs { 'vserver01' :
    ensure => present,
    state => 'on',

#Creates a NetApp FlexVol
  netapp_volume { 'volume_nfs':
    ensure => present,
    aggregate => 'aggr01_node02',
    initsize => '1g',
    junctionpath =>  '/vol1_nfs',
    exportpolicy => "nfs_export",
    state => "online"

#Modifies the root NetApp FlexVol policy
  netapp_volume { ‘rootdir’:
    ensure => present,
    exportpolicy => “nfs_export”

The other Puppet manifest manages resources in VMware vCenter (items such as creating a Datacenter, adding hosts, adding NFS datastores, and so on):

node ‘’ {
  include ‘vcenter::package’

#Creates a transport type for connecting to VMware vCenter
  transport { ‘vcenter’:
    username => “administrator@vsphere.local”,
    password => “abcxyz123”,
    server  => “”,
    options => {‘insecure’ => true }

#Creates a Datacenter in VMware vCenter
  vc_datacenter {‘DC_NFS‘:
    ensure => present,
    path => “/DC_NFS”,
    transport => Transport[‘vcenter’]  

#Adds Host to the Datacenter
  vc_host {‘’:
    ensure => present,
    path =>  “/DC_NFS”,
    username => “root”,
    password => “abcxyz123”,
    transport => Transport[‘vcenter’]  

#Mounts the Datastore to the VMware vSphere ESXi Host
  esx_datastore { ‘’:
    ensure => present,
    type => “nfs”,
    remote_host => “”,
    remote_path => “/vol1_nfs”,
    transport => Transport[‘vcenter’]

Operations staff are responsible for applying Puppet manifests by running a single command. The infrastructure components (storage and virtualization) are then appropriately configured based on the contents of the manifest, in some cases less than 30 seconds.

These catalogs can be extended further to provide multi-tier service offerings (by having catalogs corresponding to each service level), resulting in infrastructure provisioning being managed though a single manageability plane powered by Puppet.


The ability for IT to adapt to changing business requirements plays a critical role in success of an enterprise. The declarative (tell me what you want) style automation for infrastructure provisioning increases the operational agility by ensuring quick and reliable transitions between different infrastructure configurations.

Infrastructure provisioning through Puppet also enables a centralized resource management plane for managing FlexPod resources (storage, network, and compute), thereby enabling a truly Software Defined Data Center (SDDC).

May 16, 2016 12:00 AM

May 15, 2016

Gal Sagie

Kuryr and Neutron Existing Resources

If you don`t know what Kuryr is by now, shame on you! please check out our OpenStack Tokyo Kuryr Introduction talk or read my blog post about it.

Kuryr Current Status

Mitaka was the first actual release that we fully worked on Kuryr and i wanted to describe some of the nice features that we support in our current version.

In this post i will describe the ability to attach elements to existing Neutron/OpenStack resources which is useful for many different use cases, in the next part i will describe our Kubernetes integration and nested containers/Magnum support which are two really nice features we have in late progress stages.

The following diagram depicts the current components we have in Kuryr:

As you can see Kuryr already has full integration working with Docker libnetwork and with Docker Swarm as a by product of the above. What we essentially do is implementing a libnetwork remote driver and IPAM driver that maps the Docker calls into Neutron resources and model.

This works well and as user creates new resources like networks in Docker these networks are created in Neutron on the fly and Kuryr keeps the mapping between the two entities. However, we have noticed that we can provide much more by allowing users to attach their newly created Docker network into an existing Neutron network.

The Use Cases

Connecting VMs, Containers and Bare Metal

Allowing to attach into existing networks enable this very useful ability which users really need and want. The ability to connect VMs, containers and bare metal servers to the same virtual network in a seamless management experience and have a consistent networking for all the three.

There are many reasons why you want to do that and there is no real good reason why you should`nt be able to do it, connecting your OpenStack and containers workloads together and applying a unified security, isolation and policy profiles on them.

This simple feature in Kuryr allow all of the above using Neutron as we should soon see.

Bulk creations

Another use case that i hear a lot about are users that wants to deploy batches of containers and want to do it as fast as possible. Sometimes with many different networks and newly created ports.

What we noticed is that the API calls to Neutron in these cases can be time consuming and slow down this process, this is a common problem for any networking plugin implementation.

By pre-allocating networks/ports in Neutron and then only binding these elements from their pools when the user needs (when the user uses the Docker API or any other orchestration/management tool that does) we are able to improve this process quite significantly.

Attaching to existing Neutron resources feature enable us to provide and achieve this.

The Flow

The following steps demonstrate how to use this feature with Kuryr. First we create a Docker network using Docker CLI API and specify to Docker to use Kuryr as the networking driver and IPAM driver.

If we look at Neutron, we can see that a new network was created with “kuryr” in its prefix

We can see that Kuryr save the mapping between Neutron network id and Docker network id using a new feature in Neutron called resource tags. This is a feature that was designed by the Kuryr team and implemented in Neutron exactly for a use case like this (and many others as described in the spec).

You can read more about this feature in this document.

What we see above is that the user creates a network in Docker and it is automatically added to Neutron by Kuryr with the appropriate mapping, any new container that is added to this Docker network will be attached to the same Neutron network.

But what if there is already a Neutron network that already has few VMs attached to it. The user can specify the network name or network id as a Docker options. Kuryr instead of creating a new Neutron network will attach the containers to the existing network.

We can see that Kuryr keeps a special tag on the network to indicate that this is not a network created by Kuryr. This is important as we dont want to delete this network when the user delete the Docker network, just remove the associations (tha mapping tags on the Neutron network).


Currently, Kuryr only support attaching to existing networks from one Docker control plane, for example only from one environment of Docker Swarm. However, we see strong use cases that users may want to connect containers to the same network from two different Docker Swarm instances for example.

Enabling this is rather simple with the above feature, we will just need to manage tags per a Docker Swarm environment, this is something we are looking to support really soon.


I think this post demonstrate the power of Kuryr and how we can help and simplify life for users that deploy OpenStack workloads mixed with containers workloads in a very easy manner.

In the next posts i am going to describe other exciting features we are working on like Kubernetes Integration and Nested containers support.

If you want to take part of our journey feel free to reach out to me or join our IRC channel in freenode (#openstack-kuryr)

May 15, 2016 11:25 PM

David Moreau Simard

What did everyone do for the Mitaka release of OpenStack ?

Just what did everyone do for the Mitaka OpenStack release ?

RDO community liaison Rich Bowen went to find out.

He interviewed some developers and engineers that worked on OpenStack and RDO throughout the Mitaka cycle and asked them what they did and what they were up to for the Newton cycle.

It was definitely a good idea and I hope we do the same retrospective for Newton !

Here’s a quick summary of who participated and what they talked about.

David Moreau Simard


Yep, that’s me! I talk about the great improvements around the testing coverage we did for RDO, allowing us to ship the most stable release of RDO yet. I also explain what my wishlist is for the Newton cycle.

Adam Young


Adam Young is part of the core Keystone project team. He goes in depth about the challenges they had in Mitaka and what he’d like to see happen in Keystone for Newton.

Emilien Macchi


Emilien Macchi is the PTL for the Puppet OpenStack deployment project. He talks about the challenges of integration testing their puppet modules and keeping up the pace with the master branches.

Javier Pena


Javier Peña is a Senior Software Engineer at Red Hat and is part of my team involved around making RDO awesome.

He discusses the challenges around building RDO packages and making sure they’re available to the community as well as the features and improvements that were done to Packstack.

Haïkel Guémar


Haïkel Guémar is a Senior Software Engineer at Red Hat and is part of my team involved around making RDO awesome.

He discusses the focus around streamlining the contribution workflow to RDO and how he’d like to do some things better for Newton.

Chandan Kumar


Chandan Kumar is a Software Engineer at Red Hat and he’s very involved in the RDO community.

He talks about some of the new packages we needed to add throughout Mitaka in RDO and upcoming projects he’d like to add in Newton.

Ihar Hrachyshka


Ihar Hrachyshka is part of the Neutron core contributors team.

He explains the work that was done to make upgrading Neutron easier and some other cool features they managed to land in time for the release.

Ivan Chavero


Ivan Chavero is also a Senior Software Engineer part of my team focused on RDO. He explains how the Mitaka cycle was great from the perspective of stability improvements for Packstack and how we fixed a lot of bugs.

by dmsimard at May 15, 2016 02:00 PM

May 13, 2016

Ben Nemec

Musings on the Austin Summit

This is going to be a bit of a different summit recap than my previous ones. Since Steve Hardy has already posted an excellent summary of the major TripleO topics from this summit, I'm not going to duplicate all of that here.

Instead, this will be a more personally focused post, wherein I reflect on my experience at this summit and how it compared to previous ones. If that kind of navel-gazing sounds incredibly uninteresting, then you might want to pull the rip cord now. :-)

Part of the reason I decided to sit down and write this was that looking back on the week there were some interesting themes that stood out to me. Here are a few words about some of them:

Engaging with the Broader Community

One of my major goals for this summit was to do a better job of engaging with people outside my immediate team, both in the sessions and in the more informal settings where the real work gets done. ;-) In Vancouver and Tokyo, I came away feeling somewhat like I had spent a week talking to the same people that I chat with on IRC or have phone calls with on a regular basis. While this is valuable, and I still had plenty of interaction with those people in Austin, it also feels like I'm missing out on something when there are thousands of other interesting people at the event. Talking OpenStack with random people I've never met before is something I've always enjoyed at previous summits (a conversation in the underwater tunnel at the Georgia Aquarium is probably still the highlight), and I was pleased to see it make a return this time around.

Hallway Track

Somewhat related to the previous point, a lot of my best discussions this time around happened outside of sessions. This isn't necessarily unusual, but it is a marked change from the first summit I went to, which I take as a sign that I have improved at taking advantage of the whole summit environment. I wish I could write a "How To Take Advantage of the Hallway Track" post, but I don't think I can quantify how I did it. I feel like a lot came down to my aforementioned focus on engaging with the community. As much as it pains my introverted self to step out of my comfort zone in social situations, it is worth it. It also helps to be at your fifth summit and have worked on multiple teams for multiple companies though, so I think to some extent the hallway track just takes practice.


I hadn't really planned on pushing OVB yet at this summit because it still required some hacks on the underlying cloud. Then things came together shortly before the event and I was able to do a full OVB deployment on a stock, unmolested TripleO deployment (dedicated blog post about this to come later). This is pretty game-changing as it means it is now theoretically possible to get OVB into the regular upstream infra. There are still significant hurdles to overcome and it may not happen anytime soon, but at least I know it can be done.

And I think there would be significant value to the OpenStack community if we could make that happen. TripleO is already planning to move to OVB for its CI testing, and there were at least two other sessions I attended where I think OVB could be an excellent solution to the problems discussed. The first was the deployment tools session, where a number of the attendees expressed concern at not being able to do realistic multi-node baremetal-style deployments. This is exactly the problem OVB was designed to address.

The other session where I think OVB could be very helpful was the Ironic CI discussion. One of the current blockers in Ironic CI is the ability to do a full Tempest run in CI using the Ironic driver. Due to the current need to use nested virt, full Tempest takes something like 4.5 hours to run, which is long enough to make even the marathon TripleO CI jobs blush. OVB would allow Ironic to use first-level VMs for booting instances during the Tempest tests, which would almost certainly slash the time taken.

Again, this isn't going to happen tomorrow, but these discussions further convinced me that non-TripleO-specific OVB is something we should pursue. It has the potential to benefit OpenStack in a big way.

My Oslo Knowledge Isn't Completely Stale Yet

Just what it says. While I'm no longer up on the most current events in Oslo (although I do still follow the Oslo-related discussions and specs), there are a number of efforts still ongoing that relate to things I was involved with in the past. It was kind of nice to still be able to contribute to the Oslo projects in some small ways. Maybe for Newton I'll finally have time to get back to contributing to Oslo on a more regular basis. Of course, I've said that for about the past three summits, so believe it when you see it. ;-)


I had an interesting experience this summit: About ten seconds after I walked into the venue for the first time on Monday, I ran into no less than three people that I knew. Not shocking by itself - I've been around OpenStack long enough that I know a fair number of people involved with it. The interesting part was that none of these people were developers, and two of the three I had met when they worked for different companies. I don't have any deep meaning to attach to this (it's mostly a result of my having done a number of (potential-)customer visits last year, and this being the first U.S. summit since then), but it was kind of a fun Small World moment, even as OpenStack has grown to be anything but small.


It's become a trend lately that I question whether I need to do these summit recap posts when I get home. After all, there's always a ton of other people blogging about summit, many of whom are writing about topics that overlap what I would cover. I keep doing it anyway because even if I feel like a summit was not as productive as I wanted it to be, when I sit down to write my recap I realize just how much we actually accomplished, even if it wasn't all we wanted to. So I'll probably keep doing these just for the warm fuzzies, and if I manage to accidentally stumble on some nuggets of wisdom that are useful to someone else then so much the better. :-)

by bnemec at May 13, 2016 07:56 PM

OpenStack Nova Developer Rollup

API-Ref Daily Update

Recently, Nova moved our API documentation from WADL XML format to RST, a human readable text markup common in the Python community. We also moved these documents into our Nova source tree and out of the central documentation repository. As part of this effort, we are verifying the correctness of this API documentation. See the full details from the original mailing list post.

Due to the large collaborative effort, keeping track of what’s being done is pretty difficult. I started a spreadsheet with data via a script I modified originally written by Sean Dague. Because the commits are all over the place, I have to manually parse and enter much of the data. I run the script in the morning to check for updates and then update the main grid with anything new that’s being worked on. It currently takes me about 15 minutes which is why I haven’t tried to automate more than I already have.

I’ll post the updated chart here each morning when I update my spreadsheet.

<figure class="wp-caption alignnone" data-shortcode="caption" id="attachment_456" style="width: 1644px">api-ref_20160513<figcaption class="wp-caption-text">Api-ref status for May 13, 2016 (click for larger image)</figcaption></figure>


by auggy at May 13, 2016 05:42 PM

Tesora Corp

The Short Stack: Inclusivity Initiatives in the OpenStack Community, OpenStack Debunks Five Common Misconceptions, and Jumping Back on the OpenStack Bus

Welcome to the Short Stack, our regular feature where we search for the most intriguing OpenStack news. These links may come from traditional publications or company blogs, but if it’s about OpenStack, we’ll find the best ones.

Here are our latest links:

Inclusivity initiatives in the OpenStack community | OpenStack Superuser

Kavit Munshi of the Diversity Working Group and Nithya Ruff of the Women of OpenStack working group sit down at OpenStack Summit to discuss the different inclusivity initiatives in the OpenStack community, the results of the diversity survey and several new programs introduced at the Austin Summit. These programs will address diversity findings, and include a mentoring program and technical tutorials.

OpenStack Debunks Five Common Misconceptions | Talkin’ Cloud

After OpenStack Summit in Austin two weeks ago, Nicole Henderson discussed five recurring questions surrounding OpenStack. The OpenStack Foundation’s VP of Marketing & Community Services, Lauren Sell, addressed five of the most common questions heard in Austin. One of the most popular questions was whether or not OpenStack is competing with public clouds; Sell said that to compare OpenStack private cloud to public cloud is ‘wrong on a couple of levels.’

Jumping back on the (OpenStack) Bus in Austin | Computerworld

Jessica Burton remarked on the enormous growth of the OpenStack Summit attendance over the last six years. Starting with just 75 attendees in Austin in 2010, the conference grew to 7,500 just two weeks ago. Burton stressed the diversity of ‘riders’ on the OpenStack ‘bus’ as well as their ingenuity, including AT&T, who presented at the Summit. She also noted that while presentations from enterprises were impressive and plentiful, those from mid-size and smaller companies were difficult to find.

Seven Things OpenStack DBaaS Can Do that AWS Cannot | Data Center Knowledge

Tesora CEO Ken Rugg discussed seven advantages of OpenStack Database as a Service (DBaaS) over Amazon Web Services (AWS). Rugg said that although Amazon’s DBaaS offerings are seeing undeniable growth, there are certain situations where OpenStack Trove’s DBaaS offerings present a clear advantage. Among many, these include the facts that OpenStack Trove runs in the data center as well as the public cloud, provides access to source code, and allows the flexibility to update software on your own timeline.

OpenStack Mitaka aims to make open source easy-peasy | Tech Target

Jason Sparapani discussed the positive buzz surrounding the OpenStack Mitaka release regarding its easy installation, use, and management. While large enterprises like AT&T and eBay are using OpenStack for its flexibility and low cost processing power, the Mitaka release dispelled many previous worries about OpenStack calling for lots of installation, maintenance and development support.

The post The Short Stack: Inclusivity Initiatives in the OpenStack Community, OpenStack Debunks Five Common Misconceptions, and Jumping Back on the OpenStack Bus appeared first on Tesora.

by Alex Campanelli at May 13, 2016 03:01 PM

Rackspace Developer Blog

Run keystone/horizon under NGINX on Ubuntu 16.04

Run OpenStack Keystone and Horizon using NGINX on Ubuntu 16.04

I previously wrote an article showing how to convert OpenStack from using an Apache server for both Keystone and the Horizon interface. Since that article was written, OpenStack has moved to the Mitaka release, and Unbuntu has moved to a new long term release "Ubuntu 16.04 - xenial". These two releases bring a number of changes to the configuration. In this article, I show you how to make the transition to ngix running these newer releases.

The article assumes you have a working OpenStack cluster, running the Mitaka release on Ubuntu 16.04. All work will be performed on the controller node for those developers using a multi-node OpenStack cluster.

First, stop the running keystone and apache services:

service apache2 stop
service keystone stop
systemctl disable apache2.service

Apache uses wsgi, however NGINX has no direct wsgi support. Instead there are several projects that bring wsgi functionality to NGINX. We will use the uwsgi packages provided by Ubuntu. Install the NGINX server and other required packages:

apt-get install -y nginx libgd-tools nginx-doc python-django-uwsgi uwsgi uwsgi-core uwsgi-emperor uwsgi-plugin-python

Since Ubuntu usually starts services when the package is installed, stop the nginx service until we get it configured:

service nginx stop

Since keystone and horizon run behind the uwsgi service, disable these services so they don't start as systemd services:

systemctl disable keystone
systemctl disable horizon

We are not running a simple web server, so disable the default site that comes with the NGINX install:

rm /etc/nginx/sites-enabled/default

Make a log directory for keystone and horizon under NGINX, and set the proper permissions: (in this configuration, nginx will run as the www-data user, the uwsgi keystone process runs as the keystone user and the horizon (django) uwsgi runs process as the horizon user)

mkdir /var/log/nginx/keystone
mkdir /var/log/nginx/horizon
chown www-data:adm /var/log/nginx/keystone/
chown www-data:adm /var/log/nginx/horizon/

Setup log directories for the uwsgi processes:

mkdir /var/log/keystone
mkdir /var/log/horizon
chown keystone:keystone /var/log/keystone
chown horizon:horizon /var/log/horizon

and a base directory for the keystone wsgi python script:

mkdir /var/www/keystone

Keystone comes with a python script for interfacing to servers running a wsgi interface. Keystone listens on two tcp ports, one for processing admin level requests and one for requests that don't need admin level permissions. If you installed keystone from the Ubuntu package tree, this file was not included the the keystone packages. You will need to download it. For those having installed OpenStack from source, you will need to just copy the file as shown below. One copy will be used to handle requests that need the keystone admin role and the other copy is for non-admin requests.

To download it use:

wget -O /var/www/keystone/admin
wget -O /var/www/keystone/main

If you installed keystone from source cd into the horizon source directory and run:

cp keystone/httpd/ /var/www/keystone/admin
cp keystone/httpd/ /var/www/keystone/main

Now set the proper permissions on the file, so that the uwsgi keystone user can access it and it has execute permissions:

chown -R keystone:keystone /var/www/keystone
chmod ug+x /var/www/keystone/*

uwsgi has a broad set of configuration parameters. We are only going to set a minimum number of values to get this configuration running. You need to read the uwsgi documentation to ensure proper security is set up and the best performance values are selected for your environment. We set uwsgi to run 10 processes with 2 threads per process. The keystone process runs as the keystone user and the www-data group. We will also be using the uwsgi emperor to handle all of the uwsgi processes. These files will be located unter the uwsgi-emporer/vassals configuration directory. The following creates the needed uwsgi configuration file for running keystone admin requests:

cat >> /etc/uwsgi-emperor/vassals/keystone-admin.ini << EOF
master = true
workers = 10
threads = 2
no-orphans = true
plugin = python
chmod-socket = 660

socket = /run/uwsgi/keystone-admin.sock
pidfile = /run/uwsgi/

logto = /var/log/uwsgi/app/keystone-admin.log

name = keystone
uid = keystone
gid = www-data

chdir = /var/www/keystone/
wsgi-file = /var/www/keystone/admin


Next, create the needed uwsgi configuration file for running keystone non-admin requests:

cat >> /etc/uwsgi-emperor/vassals/keystone-main.ini << EOF
master = true
workers = 10
threads = 2
no-orphans = true
plugin = python
chmod-socket = 660

socket = /run/uwsgi/keystone-main.sock
pidfile = /run/uwsgi/

name = keystone
uid = keystone
gid = www-data

logto = /var/log/uwsgi/app/keystone-main.log

chdir = /var/www/keystone/
wsgi-file = /var/www/keystone/main

Lastly, to finish the uwsgi configuration create the needed uwsgi configuration file for running horizon:

cat >> /etc/uwsgi-emperor/vassals/horizon.ini << EOF
master = true
workers = 10
threads = 2
no-orphans = true
plugin = python
chmod-socket = 660

socket = /run/uwsgi-horizon/horizon.sock
pidfile = /run/uwsgi-horizon/
logto = /var/log/uwsgi/app/horizon.log

name = horizon
uid = horizon
gid = www-data

chdir = /etc/openstack_dashboard/
env = DJANGO_SETTINGS_MODULE=openstack_dashboard.settings
module = django.core.wsgi:get_wsgi_application()
wsgi-file = /var/www/wsgi/horizon.wsgi

The nginx and uwsgi processes need to talk to each other. Therefore we will use Unix sockets, becasue they don't require the overhead of TCP sockets and are faster. Create the directories for the sockets, and set the permissions for the directories:

mkdir /run/uwsgi
mkdir /run/uwsgi-horizon
chown keystone:www-data /run/uwsgi
chown horizon:www-data /run/uwsgi-horizon

Lastly, we want to set the user and group for the various uwsgi processes within the individual, so we need to remove this from the emperor configuration file:

sed -i 's/uid = www-data/#uid = www-data/g' /etc/uwsgi-emperor/emperor.ini
sed -i 's/gid = www-data/#gid = www-data/g' /etc/uwsgi-emperor/emperor.ini

Create the nginx configuration file for keystone. Remember that keystone listens on ports 5000 for normal requests and 35357 for admin requests, so we will need server entries for each port in nginx:

cat >> /etc/nginx/sites-available/keystone.conf << EOF
server {
        listen          5000;
        access_log /var/log/nginx/keystone/access.log;
        error_log /var/log/nginx/keystone/error.log;

        location / {
            uwsgi_pass      unix:///run/uwsgi/keystone-main.socket;
            include         uwsgi_params;
            uwsgi_param      SCRIPT_NAME   '';
server {
        listen          35357;
        access_log /var/log/nginx/keystone/access.log;
        error_log /var/log/nginx/keystone/error.log;

        location / {
            uwsgi_pass      unix:///run/uwsgi/keystone-admin.socket;
            include         uwsgi_params;
            uwsgi_param      SCRIPT_NAME   '';


Create the configuration file for nginx and horizon:

cat >> /etc/nginx/sites-available/horizon.conf << EOF
    server {
    listen 80;
        access_log /var/log/nginx/horizon/access.log;
        error_log /var/log/nginx/horizon/error.log;

    location / { try_files  \$uri @horizon; }
    location @horizon {
        uwsgi_pass unix:///run/uwsgi-horizon/horizon.sock;
        include uwsgi_params;
        uwsgi_param      SCRIPT_NAME   '';
    location /static {
      alias /var/www/static;

Enable both the keystone and horizon functions (sites) in nginx:

ln -s /etc/nginx/sites-available/keystone.conf /etc/nginx/sites-enabled/keystone.conf
ln -s /etc/nginx/sites-available/horizon.conf /etc/nginx/sites-enabled/horizon.conf

Start both the uwsgi service and nginx:

service uwsgi-emperor start
service nginx start

Let's verify that keystone properly responds to requests:

root@controller:~# source openrc

root@controller:~# keystone tenant-list
|                id                |   name  | enabled |
| 9d314f96330a4e459420623a922e2c09 |   demo  |   True  |
| 4382d14df2004903a16edf17e3c58652 | service |   True  |

root@controller:~# nova service-list
| Id | Binary           | Host       | Zone     | Status  | State | Updated_at                 | Disabled Reason |
| 1  | nova-cert        | controller | internal | enabled | up    | 2015-09-30T18:03:52.000000 | -               |
| 2  | nova-conductor   | controller | internal | enabled | up    | 2015-09-30T18:03:58.000000 | -               |
| 3  | nova-scheduler   | controller | internal | enabled | up    | 2015-09-30T18:03:54.000000 | -               |
| 4  | nova-compute     | compute    | nova     | enabled | up    | 2015-09-30T18:03:58.000000 | -               |
| 5  | nova-compute     | compute2   | nova     | enabled | down  | 2014-11-21T22:13:43.000000 | -               |
| 6  | nova-consoleauth | controller | internal | enabled | up    | 2015-09-30T18:03:52.000000 | -               |

If you don't get valid responses from either keystone or the other client agents, look at both the nginx log files and the log files for keystone, or the failing api service for the appropriate agent. Lastly, verify that horizon responds properly. Open http://<server ip="IP" public="public"> in your browser, and log in. If the login is successful everything is working.

May 13, 2016 07:00 AM


Why Gartner’s Mode 1 / Mode 2 is Dangerous Thinking

It’s safe to say that no keynote speech in the past year has generated more conversation and controversy as Donna Scott’s keynote address before a record-breaking crowd of 7,500+ at the OpenStack Summit in Austin.

To be sure, Donna has cred. She’s a Vice President and Distinguished Analyst at Gartner. She’s covered private cloud and OpenStack almost from the beginning. She has a following. People listen to her. Donna and other Gartner analysts (including Alan Waite of Gartner for Technical Professionals) have a balanced, pros-and-cons take on OpenStack.

Donna’s talk was met with raised eyebrows and objections. If you’re interested in reading up on those objections, there are some great threads on Twitter, and Gartner analyst Alan Waite even wrote a blog post in response. OpenStack Foundation exec Lauren Sell also addressed the kerfuffle in a post-Summit wrap up blog.

But I want to focus on Bimodal IT. It’s Gartner’s map of how it advises enterprise IT leadership to think about transformation from legacy infrastructure and app dev models (sequential, emphasizing safety and accuracy) to agile, cloud-first models (exploratory and nonlinear, emphasizing agility and speed). Operationally, it can be thought of as the practice of managing two separate, coherent modes of IT delivery: mode one is focused on stability and mode two on agility. If you know nothing about agile, and your entire world revolves around supporting legacy apps on legacy infrastructure, the concept of Bimodal IT can be useful.

Unfortunately, that’s the only place it’s useful. Here’s why.

Bimodal IT essentially segregates the “good” technology, processes and skill sets (mode 1) from the “experimental” (mode 2). If competitors are out-innovating you with new products and value delivered from agile methodologies, then mode 2 is where you must go. Embrace mode 2 when you have no other alternative.

However, if you’re talking about “bet the company” applications, then mode 1 is where serious IT leaders go to do serious work. Gartner will disagree emphatically with this assessment, but any rational observer must conclude that, regardless of the intent, bimodal IT reads as the difference between serious IT (mode 1) and play time (mode 2).

The result? Bimodal IT alienates people in enterprise IT who see cloud and agile as nothing short of the next generational IT wave. Mainframes,  client/server, e-commerce…. Bimodal IT bills itself as a roadmap for technology adoption, but in reality, the concept picks winners and losers, causing confusion on priorities and strategies.

Yes, some legacy environments will take time to go away, like low-latency trading systems. However, this doesn’t mean that all aspects of mode 1 remain as they are – for example, cycle times can get shorter, approach can move to agile (remember, you don’t need “cloud” to go agile) etc.

But innovation and speed shouldn’t just happen in an isolated environment, in order for enterprises to succeed in today’s competitive environment, they MUST move towards agile deployment of services, they must foster innovation across the organization, they should move IT decisions closer to the business and developers – leaving a decaying group within the enterprise will only slow it down.

Think about what their definition of modes 1 and 2 are, the first focuses on safety and accuracy, the second on agility and speed. Last time I checked with a customer, all the new services they are rolling out that require agility and speed also require to be safe and accurate.

We work with many global 1000 organizations to enable them to adopt Open Infrastructure solutions. In fact, we recently postulated that today we are in an Open Infrastructure 2.0 world, where organizations are leveraging new processes, skills and technologies to enable agility and efficiency in the enterprise.

When it comes to adoption of these new and fast changing pieces, we recommend that the organization incubate the concept, drawing from multiple disciplines within the existing organization, leveraging outside experts to accelerate the knowledge gathering, building quick and iterative pilots to demonstrate the success of these concepts/technologies, defining metrics (KPI’s) that show executives hard, tangible improvements, and then roll out the iterated, proven concepts to the rest of the organization. This will take months, it will not happen overnight.

This approach helps enterprises move towards “mode 2” and also takes “mode 1” along, without leaving it behind, in Gartner parlance. Our opinion—and indeed, that of anyone who has successfully guided an enterprise to agile—is that dividing IT into one bucket labeled “stuff that works” and another labeled “stuff that might be better one day” is dangerous.

I’m not alone in my opinion about the dangers of Bimodal IT. Bernard Golden, Jason Bloomberg, and Mark Campbell each have instructive views on the topic, all of which precede Donna’s keynote in Austin.

While mine is an opinion forged in the furnace of building agile infrastructures and teaching organizations how to use it, you might have a different take. Would love to hear your take on Bimodal IT.


Author: Francesco Paola, CEO, Solinea

The post Why Gartner’s Mode 1 / Mode 2 is Dangerous Thinking appeared first on Solinea.

by Solinea at May 13, 2016 04:06 AM

May 12, 2016


Update of the Mediawiki Apps

Cloudwatt provides an update of the Mediawiki apps to be aligned with the latest available Mediawiki version: 1.26.

by Julien DEPLAIX at May 12, 2016 10:00 PM

IBM OpenTech Team

Industry leaders headline IBM’s Open Cloud Architecture Summit in Seattle

This June, we guarantee the clouds will open in Seattle. OCA promo tile-01 FINAL

IBM will host another edition of its Open Cloud Architecture (OCA) Summit in The Emerald City on June 22—the day after DockerCon (giving attendees a welcome excuse to stay in Seattle for an extra day!).

The theme for this #IBMOCA event is “Learning to love open hybrid cloud.” As more businesses adopt hybrid cloud ecosystems, the need for open technology has never been greater. We’ve seen time and again that when systems speak a common language, innovation can happen faster. The same goes for hybrid cloud, and technologies like OpenStack, Cloud Foundry, Docker and more are helping to create the open ecosystem of the future.

The event will take place in Seattle’s Palace Ballroom, and our packed agenda kicks off at 11:30am with a networking lunch. We’ll end with a closing reception from 4-4:45 p.m.

We’ve held OCA Summits before, but never like this one. Our lineup features an eclectic mix of industry leaders and IBMers who will discuss trends, offer industry viewpoints, share success stories and build on the open cloud architecture discourse. Attendees and speakers will include business and technology leaders, industry experts, analysts and members of the press.

Among the IBMers set to speak are Angel Diaz, IBM’s vice president of Cloud Architecture and Technology, plus leaders from IBM’s recent Seattle-based acquisition, Blue Box. More information can be found on our speaker page – check back often for updates!

We encourage you to save the date and join us in Seattle for the Open Architecture Summit. Link here to register today!

You can follow news and updates surrounding the event on Twitter with the hashtag #IBMOCA.

by Johanna Koester at May 12, 2016 08:14 PM


StackLight – the Logging, Monitoring and Alerting (LMA) toolchain of Mirantis OpenStack

The post StackLight – the Logging, Monitoring and Alerting (LMA) toolchain of Mirantis OpenStack appeared first on Mirantis | The Pure Play OpenStack Company.

In a post of December 2015, I introduced the concepts and base building blocks underpinning the so-called Logging, Monitoring and Alerting (LMA) Toolchain of Mirantis OpenStack, now officially called StackLight. The purpose of this second post is talk about what’s new in StackLight 0.9 (compatible with Mirantis OpenStack 8.0), which you can download from the Fuel Plugins Catalog.

The depth and breadth of the new features we have added in this release is quite significant as outlined below. The main theme of these features revolves around resiliency and scale requirements.

Clustering of the backend servers for high availability and scale

There are still four Fuel plugins in the toolchain, but in StackLight 0.9, those plugins can be deployed on a cluster of nodes for high availability and scale. Note that the cluster of nodes can be made of physical machines or virtual machines using the Reduced Footprint feature of Fuel.

Setting up a StackLight cluster is actually quite simple because all the heavy lifting work is done for you automatically and transparently by the plugins. All you need to do is to assign the StackLight Fuel plugins roles to nodes in your environment and deploy, as shown in Figure 1.


Figure 1: Assigning roles to StackLight servers

One improvement from StackLight 0.8 is the fact that StackLight 0.9 is composed of hot-pluggable plugins, which means that it is possible to deploy your StackLight cluster after you have deployed your OpenStack environment, throughthough installation of the Collectors on the OpenStack nodes requires a configuration change and a restart of all the OpenStack services.

InfluxDB-Grafana Plugin highlights:

The InfluxDB-Grafana Plugin also has some additional new features, including:

  • Upgrade to InfluxDB 0.10.0 with clustering support (considered beta by InfluxData).
  • The TSM storage engine is advertised by InfluxData to sustain write load of more than 350K points per sec on a fast disk (ideally an SSD).
  • The InfluxDB cluster must have at least 3 meta nodes in order to form a Raft consensus.
  • Clustering is used for HA (not scale as all time-series are replicated in the cluster) for both InfluxDB and Grafana.
  • Added configurable retention period in the plugin settings (30 days by default).
  • Fuel plugin support for InfluxDB clustering includes:
    • Deployment of InfluxDB on one or three nodes. The deployment of InfluxDB on two nodes (for data replication) is technically possible but it is not recommended (nor supported) as there may be situations where the failover will not work properly.
    • The ability to add and remove nodes after deployment via the Fuel UI.
    • All nodes are both meta nodes and data nodes.
    • The time-series are synchronously replicated on all nodes.
    • The API endpoint VIP is managed by HAProxy and Pacemaker.

Elasticsearch-Kibana Plugin highlights:

New features in the Elasticsearch-Kibana plugin include:

  • Upgrade to Elasticsearch 1.7.4, bringing better resiliency, new features, security fixes, clustering stability and recovery improvements.
  • Clustering, used for both scale and HA for both Elasticsearch and Kibana.
  • The cluster must have at least three nodes to avoid split-brain issues.
  • Configurable retention period in the plugin settings (30 days by default).
  • Fuel plugin support for Elasticsearch clustering includes:
    • Cluster size up to five nodes
    • All nodes store data and can be elected master
    • Five shards per index type per day
    • Data is replicated on all nodes but is configurable in the plugin settings
    • The ability to add and remove nodes after deployment via Fuel UI
    • The API endpoint VIP is managed by HAProxy and Pacemaker

In addition to clustering support, StackLight 0.9 comes with a number of bug fixes that are detailed in the Release Notes, which are available in the plugin documentation, as well as several other new capabilities.

Logs monitoring

A high rate of errors in the logs is often an indication that something is going wrong and should be acted upon. The good news is that an unusual error rate in the logs can now be detected thanks to a new log_messages metric that contains a logging rate value per severity level and per service. As with any other metric, the log_messages metric can be added to an alarm rule that will fire an anomaly and fault detection (AFD) metric if the logging rate, for a given severity level, such as ‘ERROR’, exceeds a threshold.

Worker alarms per node

Prior to StackLight 0.9, it wasn’t possible to know on which node a particular OpenStack worker was down. This information is now captured and displayed in the Grafana dashboards for all the OpenStack core services dashboards.

Libvirt Instances Monitoring

Stacklight 0.9 introduces the monitoring of libvirt instances. Ceilometer is not used at this stage and so the instances metrics are not tagged with Nova metadata such as the tenant ID. A new Hypervisor Grafana dashboard was created to visualize those metrics instead. It is possible to visualize the libvirt metrics in the Hypervisor dashboard by node name, instance ID, disk and interface name dimensions.

How to get StackLight 0.9

To get started with StackLight 0.9, first deploy Mirantis OpenStack 8.0. From there, you can go to the Fuel Plugins Catalog and search for MOS 8.0 plugins in the MONITORING category as shown belowStackLight.



We’ve created a video overview and demo of Stacklight. Check it out!
<iframe allowfullscreen="allowfullscreen" frameborder="0" height="315" src="" width="560"></iframe>

The post StackLight – the Logging, Monitoring and Alerting (LMA) toolchain of Mirantis OpenStack appeared first on Mirantis | The Pure Play OpenStack Company.

by Patrick Petit at May 12, 2016 07:44 PM

Daniel P. Berrangé

Analysis of techniques for ensuring migration completion with KVM

Live migration is a long standing feature in QEMU/KVM (and other competing virtualization platforms), however, by default it does not cope very well with guests whose workload are very memory write intensive. It is very easy to create a guest workload that will ensure a migration will never complete in its default configuration. For example, a guest which continually writes to each byte in a 1 GB region of RAM will never successfully migrate over a 1Gb/sec NIC. Even with a 10Gb/s NIC, a slightly larger guest can dirty memory fast enough to prevent completion without an unacceptably large downtime at switchover. Thus over the years, a number of optional features have been developed for QEMU with the aim to helping migration to complete.

If you don’t want to read the background information on migration features and the testing harness, skip right to the end where there are a set of data tables showing charts of the results, followed by analysis of what this all means.

The techniques available

  • Downtime tuning. Unless the guest is completely idle, it never possible to get to a point where 100% of memory has been transferred to the target host. So at some point there needs to be a decision made about whether enough memory has been transferred to allow the switch over to the target host with acceptable blackout period. The downtime tunable controls how long a blackout period is permitted during the switchover. QEMU measures the network transfer rate it is achieving and compares it to the amount of outstanding RAM to determine if it can be transferred within the configured downtime window. When migrating it is not desirable to set QEMU to use the maximum accepted downtime straightaway, as that guarantees that the guest will always suffer from the maximum downtime blackout. Instead, it is better to start off with a fairly small downtime value and increase the permitted downtime as time passes. The idea is to maximise the likelihood that migration can complete with a small downtime.
  • Bandwidth tuning. If the migration is taking place over a NIC that is used for other non-migration related actions, it may be desirable to prevent the migration stream from consuming all bandwidth. As noted earlier though, even a relatively small guest is capable of dirtying RAM fast enough that even a 10Gbs NIC will not be able to complete migration. Thus if the goal is to maximise the chances of getting a successful migration though, the aim should be to maximise the network bandwidth available to the migration operation. Following on from this, it is wise not to try to run multiple migration operations in parallel unless their transfer rates show that they are not maxing out the available bandwidth, as running parallel migrations may well mean neither will ever finish.
  • Pausing CPUs. The simplest and crudest mechanism for ensuring guest migration complete is to simply pause the guest CPUs. This prevent the guest from continuing to dirty memory and thus even on the slowest network, it will ensure migration completes in a finite amount of time. The cost is that the guest workload will be completely stopped for a prolonged period of time. Think of pausing the guest as being equivalent to setting an arbitrarily long maximum permitted downtime. For example, assuming a guest with 8 GB of RAM and an idle 10Gbs NIC, in the worst case pausing would lead to to approx 6 second period of downtime. If higher speed NICs are available, the impact of pausing will decrease until it converges with a typical max downtime setting.
  • Auto-convergence. The rate at which a guest can dirty memory is related to the amount of time the guest CPUs are permitted to run for. Thus by throttling the CPU execution time it is possible to prevent the guest from dirtying memory so quickly and thus allow migration data transfer to keep ahead of RAM dirtying. If this feature is enabled, by default QEMU starts by cutting 20% of the guest vCPU execution time. At the startof each iteration over RAM, it will check progress during the previous two iterations. If insufficient forward progress is being made, it will repeatedly cut 10% off the running time allowed to vCPUs. QEMU will throttle CPUs all the way to 99%. This should guarantee that migration can complete on all by the most sluggish networks, but has a pretty high cost to guest CPU performance. It is also indiscriminate in that all guest vCPUs are throttled by the same factor, even if only one guest process is responsible for the memory dirtying.
  • Post-copy. Normally migration will only switch over to running on the target host once all RAM has been transferred. With post-copy, the goal is to transfer “enough” or “most” RAM across and then switch over to running on the target. When the target QEMU gets a fault for a memory page that has not yet been transferred, it’ll make an explicit out of band request for that page from the source QEMU. Since it is possible to switch to post-copy mode at any time, it avoids the entire problem of having to complete migration in a fixed downtime window. The cost is that while running in post-copy mode, guest page faults can be quite expensive, since there is a need to wait for the source host to transfer the memory page over to the target, which impacts performance of the guest during post-copy phase. If there is a network interruption while in post-copy mode it will also be impossible to recover. Since neither the source or target host has a complete view of the guest RAM it will be necessary to reboot the guest.
  • Compression. The migration pages are usually transferred to the target host as-is. For many guest workloads, memory page contents will be fairly easily compressible. So if there are available CPU cycles on the source host and the network bandwidth is a limiting factor, it may be worth while burning source CPUs in order to compress data transferred over the network. Depending on the level of compression achieved it may allow migration to complete. If the memory is not compression friendly though, it would be burning CPU cycles for no benefit. QEMU supports two compression methods, XBZRLE and multi-thread, either of which can be enabled. With XBZRLE a cache of previously sent memory pages is maintained that is sized to be some percentage of guest RAM. When a page is dirtied by the guest, QEMU compares the new page contents to that in the cache and then only sends a delta of the changes rather than the entire page. For this to be effective the cache size must generally be quite large – 50% of guest RAM would not be unreasonable.  The alternative compression approach uses multiple threads which simply use zlib to directly compress the full RAM pages. This avoids the need to maintain a large cache of previous RAM pages, but is much more CPU intensive unless hardware acceleration is available for the zlib compression algorithm.

Measuring impact of the techniques

Understanding what the various techniques do in order to maximise chances of a successful migration is useful, but it is hard to predict how well they will perform in the real world when faced with varying workloads. In particular, are they actually capable of ensuring completion under worst case workloads and what level of performance impact do they actually have on the guest workload. This is a problem that the OpenStack Nova project is currently struggling to get a clear answer on, with a view to improving Nova’s management of libvirt migration. In order to try and provide some guidance in this area, I’ve spent a couple of weeks working on a framework for benchmarking QEMU guest performance when subjected to the various different migration techniques outlined above.

In OpenStack the goal is for migration to be a totally “hands off” operation for cloud administrators. They should be able to request a migration and then forget about it until it completes, without having to baby sit it to apply tuning parameters. The other goal is that the Nova API should not have to expose any hypervisor specific concepts such as post-copy, auto-converge, compression, etc. Essentially Nova itself has to decide which QEMU migration features to use and just “do the right thing” to ensure completion. Whatever approach is chosen needs to be able to cope with any type of guest workload, since the cloud admins will not have any visibility into what applications are actually running inside the guest. With this in mind, when it came to performance testing the QEMU migration features, it was decided to look at their behaviour when faced with the worst case scenario. Thus a stress program was written which would allocate many GB of RAM, and then spawn a thread on each vCPU that would loop forever xor’ing every byte of RAM against an array of bytes taken from /dev/random. This ensures that the guest is both heavy on reads and writes to memory, as well as creating RAM pages which are very hostile towards compression. This stress program was statically linked and built into a ramdisk as the /init program, so that Linux would boot and immediately run this stress workload in a fraction of a second. In order to measure performance of the guest, each time 1 GB of RAM has been touched, the program will print out details of how long it took to update this GB and an absolute timestamp. These records are captured over the serial console from the guest, to be later correlated with what is taking place on the host side wrt migration.

Next up it was time to create a tool to control QEMU from the host and manage the migration process, activating the desired features. A test scenario was defined which encodes details of what migration features are under test and their settings (number of iterations before activating post-copy, bandwidth limits, max downtime values, number of compression threads, etc). A hardware configuration was also defined which expressed the hardware characteristics of the virtual machine running the test (number of vCPUs, size of RAM, host NUMA memory & CPU binding, usage of huge pages, memory locking, etc). The tests/migration/ tool provides the mechanism to invoke the test in any of the possible configurations.For example, to test post-copy migration, switching to post-copy after 3 iterations, allowing 1Gbs bandwidth on a guest with 4 vCPUs and 8 GB of RAM one might run

$ tests/migration/ --cpus 4 --mem 8 --post-copy --post-copy-iters 3 --bandwidth 125 --dst-host myotherhost --transport tcp --output postcopy.json

The myotherhost.json file contains the full report of the test results. This includes all details of the test scenario and hardware configuration, migration status recorded at start of each iteration over RAM, the host CPU usage recorded once a second, and the guest stress test output. The accompanying tests/migration/ tool can consume this data file and produce interactive HTML charts illustrating the results.

$ tests/migration/ --split-guest-cpu --qemu-cpu --vcpu-cpu --migration-iters --output postcopy.html postcopy.json

To assist in making comparisons between runs, however, a set of standardized test scenarios also defined which can be run via a tests/migration/ tool, in which case it is merely required to provide the desired hardware configuration

$ tests/migration/ --cpus 4 --mem 8 --dst-host myotherhost --transport tcp --output myotherhost-4cpu-8gb

This will run all the standard defined test scenarios and save many data files in the myotherhost-4cpu-8gb directory. The same tool can be used to create charts combining multiple data sets at once to allow easy comparison.

Performance results for QEMU 2.6

With the tools written, I went about running some tests against QEMU GIT master codebase, which was effectively the same as the QEMU 2.6 code just released. The pair of hosts used were Dell PowerEdge R420 servers with 8 CPUs and 24 GB of RAM, spread across 2 NUMA nodes. The primary NICs were Broadcom Gigabit, but it has been augmented with Mellanox 10-Gig-E RDMA capable NICs, which is what were picked for transfer of the migration traffic. For the tests I decided to collect data for two distinct hardware configurations, a small uniprocessor guest (1 vCPU and 1 GB of RAM) and a moderately sized multi-processor guest (4 vCPUs and 8 GB of RAM). Memory and CPU binding was specified such that the guests were confined to a single NUMA node to avoid performance measurements being skewed by cross-NUMA node memory accesses. The hosts and guests were all running the RHEL-7 3.10.0-0369.el7.x86_64 kernel.

To understand the impact of different network transports & their latency characteristics, the two hardware configurations were combinatorially expanded against 4 different network configurations – a local UNIX transport, a localhost TCP transport, a remote 10Gbs TCP transport and a remote 10Gbs RMDA transport.

The full set of results are linked from the tables that follow. The first link in each row gives a guest CPU performance comparison for each scenario in that row. The other cells in the row give the full host & guest performance details for that particular scenario

UNIX socket, 1 vCPU, 1 GB RAM

Using UNIX socket migration to local host, guest configured with 1 vCPU and 1 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

UNIX socket, 4 vCPU, 8 GB RAM

Using UNIX socket migration to local host, guest configured with 4 vCPU and 8 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

TCP socket local, 1 vCPU, 1 GB RAM

Using TCP socket migration to local host, guest configured with 1 vCPU and 1 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

TCP socket local, 4 vCPU, 8 GB RAM

Using TCP socket migration to local host, guest configured with 4 vCPU and 8 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

TCP socket remote, 1 vCPU, 1 GB RAM

Using TCP socket migration to remote host, guest configured with 1 vCPU and 1 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

TCP socket remote, 4 vCPU, 8 GB RAM

Using TCP socket migration to remote host, guest configured with 4 vCPU and 8 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

RDMA socket, 1 vCPU, 1 GB RAM

Using RDMA socket migration to remote host, guest configured with 1 vCPU and 1 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

RDMA socket, 4 vCPU, 8 GB RAM

Using RDMA socket migration to remote host, guest configured with 4 vCPU and 8 GB of RAM

Scenario Tunable
Pause unlimited BW 0 iters 1 iters 5 iters 20 iters
Pause 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Post-copy unlimited BW 0 iters 1 iters 5 iters 20 iters
Post-copy 5 iters 100 mbs 300 mbs 1 gbs 10 gbs unlimited
Auto-converge unlimited BW 5% CPU step 10% CPU step 20% CPU step
Auto-converge 10% CPU step 100 mbs 300 mbs 1 gbs 10 gbs unlimited
MT compression unlimited BW 1 thread 2 threads 4 threads
XBZRLE compression unlimited BW 5% cache 10% cache 20% cache 50% cache

Analysis of results

The charts above provide the full set of raw results, from which you are welcome to draw your own conclusions. The test harness is also posted on the qemu-devel mailing list and will hopefully be merged into GIT at some point, so anyone can repeat the tests or run tests to compare other scenarios. What follows now is my interpretation of the results and interesting points they show

  • There is a clear periodic pattern in guest performance that coincides with the start of each migration iteration. Specifically at the start of each iteration there is a notable and consistent momentary drop in guest CPU performance. Picking an example where this effect is clearly visible – the 1 vCPU, 1GB RAM config with the “Pause 5 iters, 300 mbs” test – we can see the guest CPU performance drop from 200ms/GB of data modified, to 450ms/GB. QEMU maintains a bitmap associated with guest RAM to track which pages are dirtied by the guest while migration is running. At the start of each iteration over RAM, this bitmap has to be read and reset and this action is what is responsible for this momentary drop in performance.
  • With the larger guest sizes, there is a second roughly periodic but slightly more chaotic pattern in guest performance that is continual throughout migration. The magnitude of these spikes is about 1/2 that of those occurring at the start of each iteration. An example where this effect is clearly visible is the 4 vCPU, 8GB RAM config with the “Pause unlimited BW, 20 iters” test – we can see the guest CPU performance is dropping from 500ms/GB to between 700ms/GB and 800ms/GB. The host NUMA node that the guest is confined to has 4 CPUs and the guest itself has 4 CPUs. When migration is running, QEMU has a dedicated thread performing the migration data I/O and this is sharing time on the 4 host CPUs with the guest CPUs. So with QEMU emulator threads sharing the same pCPUs as the vCPU threads, we have 5 workloads competing for 4 CPUs. IOW the frequently slightly chaotic spikes in guest performance throughout the migration iteration are a result of overcommiting the host pCPUs. The magnitude of the spikes is directly proportional to the total transfer bandwidth permitted for the migration. This is not an inherent problem with migration – it would be possible to place QEMU emulator threads on a separate pCPU from vCPU threads if strong isolation is desired between the guest workload and migration processing.
  • The baseline guest CPU performance differs between the 1 vCPU, 1 GB RAM and 4 vCPU 8 GB RAM guests. Comparing the UNIX socket “Pause unlimited BW, 20 iters” test results for these 1 vCPU and 4 vCPU configs we see the former has a baseline performance of 200ms/GB of data modified while the latter has 400ms/GB of data modified. This is clearly nothing to do with migration at all. Naively one might think that going from 1 vCPU to 4 vCPUs would result in 4 times the performance, since we have 4 times more threads available to do work. What we’re seeing here is likely the result of hitting the memory bandwidth limit, so each vCPU is competing for memory bandwidth and thus the overall performance of each vCPU has decreased. So instead of getting x4 the performance going from 1 to 4 vCPUs only doubled the performance.
  • When post-copy is operating in its pre-copy phase, it has no measurable impact on the gust performance compared to when post-copy is not enabled at all. This can be seen by comparing the TCP socket “Paused 5 iters, 1 Gbs” test results with the “Post-copy 5 iters, 1 Gbs” test results. Both show the same baseline guest CPU performance and the same magnitude of spikes at the start of each iteration. This shows that it is viable to unconditionally enable the post-copy feature for all migration operations, even if the migration is likely to complete without needing to switch from pre-copy to post-copy phases. It provides the admin/app the flexibility to dynamically decide on the fly whether to switch to post-copy mode or stay in pre-copy mode until completion.
  • When post-copy migration switches from its pre-copy phase to the post-copy phase, there is a major but short-lived spike in guest CPU performance. What is happening here is that the guest has perhaps 80% of its RAM transferred to the target host when post-copy phase starts but the guest workload is touching some pages which are still on the source, so the page fault is having to wait for the page to be transferred across the network. The magnitude of the spike and duration of the post-copy phase is related to the total guest RAM size and bandwidth available. Taking the remote TCP case with 1 vCPU, 1 GB RAM hardware config for clarity, and comparing the “Post-copy 5 iters, 1Gbs” scenario with the “Post-copy 5 iters, 10Gbs” scenario, we can see the magnitude of the spike in guest performance is the same order of magnitude in both cases. The overall time for each iteration of pre-copy phase is clearly shorter in the 10Gbs case. If we further compare with the local UNIX domain socket, we can see the spike in performance is much lower at the post-copy phase. What this is telling us is that the magnitude of the spike in the post-copy phase is largely driven by the latency in the time to transfer an out of band requested page from the source to the target, rather than the overall bandwidth available. There are plans in QEMU to allow migration to use multiple TCP connections which should significantly reduce the post-copy latency spike as the out-of-band requested pages will not get stalled behind a long TCP transmit queue for the background bulk-copy.
  • Auto-converge will often struggle to ensure convergence for larger guest sizes or when the bandwidth is limited. Considering the 4 vCPU, 8 GB RAM remote TCP test comparing effects of different bandwidth limits we can see that with a 10Gbs bandwidth cap, auto-converge had to throttle to 80% to allow completion, while other tests show as much as 95% or even 99% in some cases. With a lower bandwidth limit of 1Gbs, the test case timed out after 5 minutes of running, having only attempted throttled down by 20%, showing auto-converge is not nearly aggressive enough when faced with low bandwidth links. The worst case guest performance seen when running auto-converge with CPUs throttled to 80% was on a par with that seen with post-copy immediately after switching to post-copy phase. The difference is that auto-converge shows that worst-case hit for a very long time during pre-copy, potentially many minutes, where as post-copy only showed it for a few seconds.
  • Multi-thread compression was actively harmful to chances of a successful migration. Considering the 4 vCPU, 8 GB RAM remote TCP test comparing thread counts, we can see that increasing the number of threads actually made performance worse, with less iterations over RAM being completed before the 5 minute timeout was hit. The longer each iteration takes the more time the guest has to dirty RAM, so the less likely migration is to complete. There are two factors believe to be at work here to make MT compression results so bad. First, as noted earlier QEMU is confined to 4 pCPUs, so with 4 vCPUs running, the compression threads have to compete for time with the vCPU threads slowing down speed of compression. The stress test workload run in the guest is writing completely random bytes which are a pathological input dataset for compression, allowing almost no compression. Given the fact the compression was CPU limited though, even if there had been a good compression ratio, it would be unlikely to have a significant benefit since the increased time to iterate over RAM would allow the guest to dirty more data eliminating the advantage of compressing it. If the QEMU emulator threads were given dedicated host pCPUs to run on it may have increased the performance somewhat, but then that assumes the host has CPUs free that are not running other guests.
  • XBZRLE compression fared a little better than MT compression. Again considering the 4 vCPU, 8 GB RAM remote TCP test comparing RAM cache sizing, we can see that the time required for each iteration over RAM did not noticeably increase. This shows that while XBZRLE compression did have a notable impact on guest CPU performance, is not seeing a major bottleneck on processing of each page as compared to MT compression. Again though, it did not help to achieve migration completion, with all tests timing out after 5 minutes or 30 iterations over RAM. This is due to the fact that the guest stress workload is again delivering input data that hits the pathological worst case in the algorithm. Faced with such a workload, no matter how much CPU time or RAM cache is available, XBZRLE can never have any positive impact on migration.
  • The RDMA data transport showed up a few of its quirks. First, by looking at the RDMA results comparing pause bandwidth, we can clearly identify a bug in QEMU’s RDMA implementation – it is not honouring the requested bandwidth limits – it always transfers at maximum link speed. Second, all the post-copy results show failure, confirming that post-copy is currently not compatible with RDMA migration. When comparing 10Gbs RDMA against 10Gbs TCP transports, there is no obvious benefit to using RDMA – it was not any more likely to complete migration in any of the test scenarios.

Considering all the different features tested, post-copy is the clear winner. It was able to guarantee completion of migration every single time, regardless of guest RAM size with minimal long lasting impact on guest performance. While it did have a notable spike impacting guest performance at time of switch from pre to post copy phases, this impact was short lived, only a few seconds. The next best result was seen with auto-converge which again managed to complete migration in the majority of cases. By comparison with post-copy, the worst case impact seen to the guest CPU performance was the same order of magnitude, but it lasted for a very very long time, many minutes long. In addition in more bandwidth limited scenarios, auto-converge was unable to throttle guest CPUs quickly enough to avoid hitting the overall 5 minute timeout, where as post-copy would always succeed except in the most limited bandwidth scenarios (100Mbs – where no strategy can ever work). The other benefit of post-copy is that only the guest OS thread responsible for the page fault is delayed – other threads in the guest OS will continue running at normal speed if their RAM is already on the host. With auto-converge, all guest CPUs and threads are throttled regardless of whether they are responsible for dirtying memory. IOW post-copy has a targetted performance hit, where as auto-converge is indiscriminate. Finally, as noted earlier, post-copy does have a failure scenario which can result in loosing the VM in post-copy mode if the network to the source host is lost for long enough to timeout the TCP connection. This risk can be mitigated with redundancy at the network layer and it is only at risk for the short period of time the guest is running in post-copy mode, which is mere seconds with 10Gbs link

It was expected that the compression features would fare badly given the guest workload, but the impact was far worse than expected, particularly for MT compression. Given the major requirement compression has in terms of host CPU time (MT compression) or host RAM (XBZRLE compression), they do no appear to be viable as a general purpose features. They should only be used if the workloads are known to be compression friendly, the host has the CPU and/or RAM resources to spare and neither post-copy or auto-converge are possible to use. To make these features more practical to use in an automated general purpose manner, QEMU would have to be enhanced to allow the mgmt application to have directly control over turning them on and off during migration. This would allow the app to try using compression, monitor its effectiveness and then turn compression off if it is being harmful, rather than having to abort the migration entirely and restart it.

There is scope for further testing with RDMA, since the hardware used for testing was limited to 10Gbs. Newer RDMA hardware is supposed to be capable of reaching higher speeds, 40Gbs, even 100 Gbs which would have a correspondingly positive impact on ability to migrate. At least for any speeds of 10Gbs or less though, it does not appear worthwhile to use RDMA, apps would be better off using TCP in combintaion with post-copy.

In terms of network I/O, no matter what guest workload, QEMU is generally capable of saturating whatever link is used for migration for as long as it takes to complete. It is very easy to create workloads that will never complete, and decreasing the bandwidth available just increases the chances of migration. It might be tempting to think that if you have 2 guests, it would take the same total time whether you migrate them one after the other, or migrate them in parallel. This is not necessarily the case though, as with a parallel migration the bandwidth will be shared between them, which increases the chances that neither guest will ever be able to complete. So as a general rule it appears wise to serialize all migration operations on a given host, unless there are multiple NICs available.

In summary, use post-copy if it is available, otherwise use auto-converge. Don’t bother with compression unless the workload is known to be very compression friendly. Don’t bother with RDMA unless it supports more than 10 Gbs, otherwise stick with plain TCP.

by Daniel Berrange at May 12, 2016 03:17 PM


Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.


Last updated:
May 25, 2016 07:03 AM
All times are UTC.

Powered by: