February 22, 2018

OpenStack Superuser

Getting Network Function Virtualization ready for prime time: Inside the standards tests

Network Function Virtualization (NFV) aims to virtualize over generic servers the network functions that typically run in dedicated appliances, so that they acquire all the advantages of cloud applications — almost a must for today’s applications like 5G.

Since its conception in 2012 by the world’s leading telecom network operators through this white paper and motivated by the increasing costs of building networks with proprietary hardware appliances, NFV has been constantly evolving through European Telecommunications Standards Institute (ETSI) standards. Deployments have been growing slowly but surely, paving the way to a brighter future for the telecom industry.  Since 2017, activities like the ETSI NFV Plugtests have been accelerating deployments by improving both interoperability and technology readiness.

In this first post in a three-part series, I’ll summarize results from ETSI’s second NFV Plugtests held in January 2018, where I participated representing Whitestack’s NFV solutions. It was an opportunity for vendors and open source communities to get together, assess interoperability and validate solutions against ETSI NFV specifications.

Before diving into the tests, let’s quickly review ETSI NFV Architecture, where we can see three fundamental divisions:

  • NFVI + VIM: The physical compute, storage and networking resources that will be virtualized (NFV Infrastructure), and the software that manages their lifecycle (Virtual Infrastructure Manager). The VIM belongs to NFV MANO according to the ETSI Architecture, but since OpenStack and other VIM tools already solve VM lifecycle, actual MANO focus is on the higher layers.
  • MANO: The Management & Orchestration software components (NFV Orchestrator and VNF Manager) that take care of the lifecycle of Network Services and Virtual Network Functions.
  • VNFs: The actual network functions, comprised by one or more virtual machines (and containers, if generically speaking), that can be integrated together to build end-to-end (virtualized) Network Services.

So back to the Plugtests: many experienced VNF vendors and the most relevant NFVI, VIM and MANO providers of the industry met at the ETSI headquarters at Sophia Antipolis, France where we spent a whole week inside a big room going over a number of interoperability tests.

Testing inside ETSI’s headquarters.

Of course, to make this big challenge possible (dozens of companies, some of them competitors, working together), ETSI organized things well in advance. They set up weekly group calls around four months beforehand, had a VPN in place, called the NFV Plugtests HIVE, so everyone could connect their solutions in advance (see the image below with the participants spanning the globe) and ensured everyone completed a ‘pre-testing’ process before the onsite week. A special thanks to the ETSI team for achieving a second time and for managing to get more testing done, with more features and in half the time compared to the first edition!

So, what was tested?

  • Multi-VNF Network Services lifecycle (instantiation, manual scaling and termination) –> Two or more VNFs, from different vendors, in the same service.
  • Multi-site/VIM Network Service deployments –> VNFs from the same Network Services in different data centers.
  • Enhanced Platform Awareness features (SR-IOV, CPU Pinning, Huge Pages, etc.) –> performance boost for VNFs!
  • End-to-end Performance Management –> to be able to grab metrics from and create thresholds on both the VIM and the VNFs.
  • MANO Automatic Scaling (out/in) capabilities based on performance metrics from both VNFs and VIMs –> to add or remove VNFs/VDUs based on VIM/VNF metrics.
  • End-to-end Fault Management –> events and alarms propagation to higher layers
  • An optional API test track provided by ETSI to experiment with compliance testing on some specific interfaces (NFV SOL 002 for the Ve-Vnfm reference point, NFV SOL 003 for the Or-Vnfm reference point)

Even though our obvious objective was to run tests successfully, ETSI encouraged us to ensure marking results as ‘failed’ when interoperability did not work so that this rich feedback will influence NFV standards in a positive way.

We brought both our MANO Solution (WhiteNFV, based on Open Source MANO Release 3), and our VIM Solution for Cloud and NFV environments (WhiteCloud, based on OpenStack Pike distribution), so we had to find interoperability with other NFVI/VIM and MANO providers respectively. I’m glad to say that both performed pretty well!

In parts two and three of this series, we’ll dive into the official Plugtest results which ETSI just made public this week, so stay tuned.

About the author

Gianpietro Lavado is a network solutions architect interested in the latest software technologies to achieve efficient and innovative network operations. He currently works at Whitestack, a company whose mission is to promote SDN, NFV, cloud and related deployments all around the world, including a new OpenStack distribution.

This post first appeared on LinkedIn. Superuser is always interested in community content, get in touch at: editorATopenstack.org

The post Getting Network Function Virtualization ready for prime time: Inside the standards tests appeared first on OpenStack Superuser.

by Gianpietro Lavado at February 22, 2018 05:03 PM

February 21, 2018

OpenStack Superuser

Inside CERN’s Open Data portal

Although the European Organization for Nuclear Research occupies a relatively modest patch of land on the French-Swiss border, the scope of CERN’s research is big — from the Higgs boson (aka “God particle”), anti-matter and dark matter, to extra dimensions – and the amount of data generated is truly vast.

Since 2014, the CERN Open Data portal, which runs on OpenStack, has been making it available for high schoolers, data scientists and armchair physics buffs. The most recent information made public in late 2017 includes a petabyte of data, including sets related to the discovery of the Higgs boson glimpsed through the Large Hadron Collider.

Superuser talks to Tibor Simko, technology lead behind the CERN Open Data portal, about the backend as well as the future of the project.

CERN’s Open Data Portal runs on OpenStack — what can you tell us about the backend?

The CERN Open Data portal was first released in 2014. The portal ran on about eight virtual machines on the CERN OpenStack cloud infrastructure. The machines were managed by Puppet. The architecture includes the front-end load balancing servers running HAproxy, dispatching user requests to the caching servers running Nginx that either serve the cached content, or, if needed, dispatch the user request further to the CERN Open Data web application itself. The application runs on top of the Invenio digital repository framework that further uses Redis caching service and SQL relational database services.

The CERN Open Data portal hosts several thousands of records representing datasets, software, configuration files, documentation and related supplementary information released as open data by the LHC experiments. The total amount of released data represents more than 1400 Terabytes. The data assets themselves are stored on the CERN EOS Open Storage system. The CERN Open Data portal relies heavily on the EOS distributed storage system regarding its backend data storage needs.

What were the resources at your disposal and scope of the portal project when it launched and what are they now?

On the application side, the CERN Open Data portal recently underwent a major change of the underlying repository framework, from Invenio 2 to Invenio 3. We have upgraded our data model, the user interface and improved the faceted search experience. The new portal was released in December 2017.

On the infrastructure side, we have been early adopters of container technologies and used Docker for the CERN Open Data portal development since its beginning. We are now using containers also for the portal production deployment itself using the OpenShift platform.

Besides the changes in the portal technology and deployment, the amount of the open data released by the LHC experiments has grown in an exponential manner. For example, the CMS collaboration released a part of the 2010 datasets of about 30 TB in 2014 initially; the 2011 datasets of about 300 TB were released in 2016, and the 2012 datasets that we just released were about 1 PB in size!

The CERN Open Data portal’s storage component relies heavily on the scalability of the EOS storage system to host large amounts of released open data. Since the CERN EOS system manages over 250 Petabytes at CERN overall, the amount of the open data pool remains moderate when compared to the regular daily usage by the physicists.

What are the challenges particular to planning an open-access cloud like this one?

The challenges were of diverse nature.

First, we needed to organize the datasets, the software, the configuration, the documentation and the auxiliary information so that it would be understandable by non-specialists. We were working closely with the LHC experiments on the data organization and management.

Histograms created from the CMS experiment at CERN’s Open Data Portal.

Second, we had to present the research data to a non-typical audience consisting of data scientists, high-school students and the general public. We’ve integrated tools that permitted to explore the data via event display visualization or basic histogramming; we’ve also provided detailed guides on how to run more advanced physics analyses on the released primary datasets.


Third, we had to design a scalable system that would be capable of serving possibly thousands of parallel user requests at a time, following the usage peaks coming after widely-covered press releases or social media events.

What can you tell us about future plans?

We plan to improve the discoverability of the CERN Open Data material by exposing our datasets using general standards such as JSON-LD with schema.org. We plan to publish REST API interfaces to enable users to easily write applications against the portal.

We are looking forward to forthcoming open data releases from LHC experiments. We are excited to host a first non-LHC experiment open data issued by the OPERA collaboration.

Finally, we plan to facilitate working with the open data by providing richer and more easily runnable analysis examples.

How did your previous work on the Invenio digital library platform inform this project?

The Invenio digital library framework has been developed at CERN to run services that were originally oriented towards managing articles, books, preprints, theses, audios, photos, videos. Progressively, the experimental particle physics collaborations have been using the open access publications services to share the supplementary material to publications, such as numerical “data behind plots,” the plotting macro snippets, or the event display files. This brought a natural evolution of Invenio’s focus from targeting the “small data” domain towards the “big data” domain.

We have been trying to describe and capture the structured information about the whole research process, including the experimental collision and simulated datasets, the virtual machines, the analysis software and the computational workflows used by the physicists to analyze the data that all together produce the original scientific results and publications. Capturing the research analysis process and its computational workflows in order to make the science more easily reproducible and reusable in the future light of new theories is an exciting topic that coherently subscribes to the wider evolution of the open access movement through open data to open science.

Anything else you’d like to share?

Before joining CERN, I worked in the field of computational plasma physics, all the while being involved with parallel software programming activities and side projects. I feel therefore particularly enthusiastic and privileged to work at CERN in the field of open science that bridges computing and physics. We’re trying to foster better open and reproducible science practices through offering better open science software tools that would assist the particle physicists researcher in their data analysis computing needs.

About Tibor Simko
Tibor Simko holds a PhD in plasma physics from Comenius University Bratislava, Slovakia and from the University of Paris Sud, France. He joined CERN to work as a computing engineer where he founded the Invenio digital repository framework. Simko later worked as a technology director of INSPIRE, the high energy physics information system. He now leads the development of the CERN Analysis Preservation, CERN Open Data and the Reusable Analyses projects. His professional interests include open science and reproducible research, information management and retrieval, software architecture and development, psychology of programming, free software culture and more.

You can find him on Twitter  and GitHub.

The post Inside CERN’s Open Data portal appeared first on OpenStack Superuser.

by Nicole Martinelli at February 21, 2018 05:10 PM

OpenStack in Production

Maximizing resource utilization with Preemptible Instances


The CERN cloud consists of around 8,500 hypervisors providing over 36,000
virtual machines. These provide the compute resources for both the laboratory's
physics program but also for the organisation's administrative operations such
as paying bills and reserving rooms at the hostel.

The resources themselves are generally ordered once to twice a year with servers being kept for around 5 years. Within the CERN budget, the resource planning teams looks at:
  • The resources required to run the computing services requirements for the CERN laboratory. These are projected using capacity planning trend data and upcoming projects such as video conferencing.
With the installation and commissioning of thousands of servers concurrently
(along with their associated decommissioning 5 years later), there are scenarios
to exploit underutilised servers. Programs such as LHC@Home are used but we have also been interested to expand the cloud to provide virtual machine instances which can be rapidly terminated in the event of
  • Resources being required for IT services as they scale out for events such as a large scale web cast on a popular topic or to provision instances for a new version of an application.
  • Partially full hypervisors where the last remaining cores are not being requested (the Tetris problem).
  • Compute servers at the end of their lifetime which are used to the full before being removed from the computer centre to make room for new deliveries which are more efficient and in warranty.
The characteristics of this workload is that it should be possible to stop an
instance within a short time (a few minutes) compared to a traditional physics job.

Resource Management In Openstack

Operators use project quotas for ensuring the fair sharing of their infrastructure. The problem with this, is that quotas pose as hard limits.This
leads to actually dedicating resources for workloads even if they are not used
all the time or to situations where resources are not available even though
there is quota still to use.

At the same time, the demand for cloud resources is increasing rapidly. Since
there is no cloud with infinite capabilities, operators need a way to optimize
the resource utilization before proceeding to the expansion of their infrastructure.

Resources in idle state can occur, showing lower cloud utilization than the full
potential of the acquired equipment while the users’ requirements are growing.

The concept of Preemptible Instances can be the solution to this problem. These
type of servers can be spawned on top of the project's quota, making use of the
underutilised  capabilities. When the resources are requested by tasks with
higher priority (such as approved quota), the preemptible instances are
terminated to make space for the new VM.

Preemptible Instances with Openstack

Supporting preemptible instances, would mirror the AWS Spot Market and the
Google Preemptible Instances. There are multiple things to be addressed here as
part of an implementation with OpenStack, but the most important can be reduced to these:
  1. Tagging Servers as Preemptible
In order to be able to distinguish between preemptible and non-preemptible
servers, there is the need to tag the instances at creation time. This property
should be immutable for the lifetime of the servers.
  1. Who gets to use preemptible instances
There is also the need to limit which user/project is allowed to use preemptible
instances. An operator should be able to choose which users are allowed to spawn this type of VMs.
  1. Selecting servers to be terminated
Considering that the preemptible instances can be scattered across the different cells/availability zones/aggregates, there has to be “someone” able to find the existing instances, decide the way to free up the requested resources according to the operator’s needs and, finally, terminate the appropriate VMs.
  1. Quota on top of project’s quota
In order to avoid possible misuse, there could to be a way to control the amount of preemptible resources that each user/project can use. This means that apart from the quota for the standard resource classes, there could be a way to enforce quotas on the preemptible resources too.

OPIE : IFCA and Indigo Dataclouds

In 2014, there were the first investigations into approaches by Alvaro Lopez
from IFCA (https://blueprints.launchpad.net/nova/+spec/preemptible-instances).
As part of the EU Indigo Datacloud project, this led to the development of the
OpenStack Pre-Emptible Instances package (https://github.com/indigo-dc/opie).
This was written up in a paper to Journal of Physics: Conference Series
(http://iopscience.iop.org/article/10.1088/1742-6596/898/9/092010/pdf) and
presented at the OpenStack summit (https://www.youtube.com/watch?v=eo5tQ1s9ZxM)

Prototype Reaper Service

At the OpenStack Forum during a recent OpenStack summit, a detailed discussion took place on how spot instances could be implemented without significant changes to Nova. The ideas were then followed up with the OpenStack Scientific Special Interest Group.

Trying to address the different aspects of the problem, we are currently
prototyping a “Reaper” service. This service acts as an orchestrator for
preemptible instances. It’s sole purpose is to decide the way to free up the
preemptible resources when they are requested for another task.

The reason for implementing this prototype, is mainly to help us identify
possible changes that are needed in Nova codebase to support Preemptible

More on this WIP can be found here: 


The concept of Preemptible Instances gives operators the ability to provide a
more "elastic" capacity. At the same time, it enables the handling of increased
demand for resources, with the same infrastructure, by maximizing the cloud

This type of servers is perfect for tasks/apps that can be terminated at any
time, enabling the users to take advantage of extra cpu power on demand without the fixed limits that quotas enforce.

Finally, here in CERN, there is an ongoing effort to provide a prototype
orchestrator for Preemptible Servers with Openstack, in order to pinpoint the
changes needed in Nova to support this feature optimally. This could also be
available in future for other OpenStack clouds in use by CERN such as the
T-Systems Open Telekom Cloud through the Helix Nebula Open Science Cloud


  • Theodoros Tsioutsias (CERN openlab fellow working on Huawei collaboration)
  • Spyridon Trigazis (CERN)
  • Belmiro Moreira (CERN)


by Theodoros Tsioutsias (noreply@blogger.com) at February 21, 2018 01:34 PM

February 20, 2018

Chris Dent

TC Report 18-08

Most TC activity has either been in preparation for the PTG or stalling to avoid starting something that won't be finished before the PTG. But a few discussions to point at.

When's the Next PTG?

Last Tuesday evening had a brief discussion asking when (and where) the next PTG will be, after Dublin. The answer? We don't know yet. It will likely come up in Dublin.

Base Services and Eventlet

A question about base services led to discussion about ways to technically and socially avoid the use of eventlet. Notable was the observation that we continue to have new projects that adopt patterns established in Nova that while perfectly workable are no longer considered ideal. There's some work to do to make sure we provide a bit more guidance.

Naming for S Release

Rocky is starting, so it is time to be thinking about naming for S. Berlin is the geographic location that will be the source for names beginning with "S".

Python 3.6

Most of Friday was devoted to Python 3.6. Many distros are headed that way and OpenStack CI is currently 2.7 and 3.5.

TC Topics at the PTG

A reminder that there is an etherpad for TC topics at the PTG.

Because of the PTG there won't be a TC Report next week, but I will endeavor to write up standalone reports of the discussions started by that etherpad. Those discussion will hopefully grant a bit more vigor, drive, and context to the TC Report, which has wandered a bit of late.

by Chris Dent at February 20, 2018 06:45 PM

OpenStack Superuser

What you need to know about cloud edge computing

There’s a lot of speculation about whether edge computing will disperse the dominance of the cloud.

But what exactly is it?

A group of open infrastructure experts put their heads together for recent white paper (.PDF) to detail the what, how and when of cloud edge computing.

The OpenStack Foundation (OSF)’s Edge Computing Group counts members from AT&T, Cisco, Ericsson, HPE, Inmarsat, Red Hat, Verizon and Walmart Labs and the report aims to disperse any doubts: At its simplest, cloud edge computing means “offering application developers and service providers cloud computing capabilities, as well as an IT service environment at the edge of a network.” The basic characteristic of edge computing, the authors elaborate, is that the infrastructure is located closer to the end user, that the scale of site distribution is high and edge nodes are connected by wide area network (WAN) network connections.

Titled “Cloud edge computing: Beyond the data center,” the report details the core benefits (mainly reduced latency and mitigating bandwidth) as well as use cases, scenarios and current challenges involved in cloud edge.

Allowing that there are probably already dozens of ways cloud edge can be used, the authors outline some common use cases:

  • Data collection and analytics: “Internet of things, where data is often collected from a large network of microsites, benefits from the edge computing model. Sending masses of data over often limited network connections to an analytics engine located in a centralized data center is counterproductive…”
  • Real-time/immersive: “AR/VR, connected cars, telemedicine, tactile internet Industry 4.0 and smart cities, are unable to tolerate more than a few milliseconds of latency and can be extremely sensitive to jitter, or latency variation.” More on how this could change our daily lives here.
  • Self-contained and autonomous site operations “These could include transportation (planes, buses, ships), mining operations (oil rigs, pipelines, mines), power infrastructure (wind farms, solar power plants) and even environments that should typically have good connectivity, like stores.”
  • Network functions virtualization (NFV)
    “NFV is at its heart the quintessential edge computing application because it provides infrastructure functionality. Telecom operators are looking to transform their service delivery models by running virtual network functions as part of, or layered on top of, an edge computing infrastructure.”
  • Network efficiency
  • Security
  • Privacy “Enterprises may have needs for edge computing capacity depending on workloads,
    connectivity limits and privacy. For example, medical applications that need to
    anonymize personal health information (PHI) before sending it to the cloud could
    do this utilizing edge computing infrastructure.”
  • Compliance
Mobile will become a common environment for cloud edge computing.

How to get involved

The 17-page report winds up with a call to action. “We recognize that there is work to be done to achieve our goals of creating the tools to meet these new requirements…and encourage the entire open-source community to join in to define and develop cloud-edge computing.”

In addition to checking for updates on the OSF Edge Computing page, you can help shape the future by:



The post What you need to know about cloud edge computing appeared first on OpenStack Superuser.

by Superuser at February 20, 2018 04:44 PM

February 17, 2018

Doug Hellmann

beagle 0.1.0

beagle is a command line tool for querying a hound code search service, such as https://codesearch.openstack.org This is the first release of beagle.

by doug at February 17, 2018 12:20 AM

February 16, 2018

OpenStack Superuser

For a happier open source community, give recognition

Every open source community is made up of real people with real feelings. Many open source contributors are working in their free time to provide essential software that we use daily. Sometimes praise is lost in the feedback of bugs or missing features. Focusing on too much negative feedback can lead contributors to frustration and burnout.

However you end up contributing to OpenStack, or any open source project, I believe that what gets people excited about working with a community is some form of recognition.

My first answer to people coming into the OpenStack community is to join our Project Team Gathering event. Significant changes are discussed here to understand the technical details to carry out the work in the new release. You should seek out people who are owners of these changes and volunteer to work on a portion of the work. Not only are these people interested in your success by having you take on some of the work they have invested in, but you will be doing work that interests the entire team. You’ll finish the improvements and be known as the person in the project with the expertise in that area. You’ll receive some recognition from the team and the community using your software. And just like that, you’re hooked because you know your work is making a difference. Maybe you’ll improve that area of the project more, venture onto other parts of the project, or even expand to other open source projects.

If you work in the OpenStack community, there’s also another way you can give and get recognition. In OpenStack IRC channels, you can thank members of the community publicly with the following command:

#thanks <irc_nick> for being a swell person in that heated discussion!

To be clear,  <irc_nick> is replaced with the person you want to give thanks.

Where does this information go? Just like the Success Bot in which we can share successes as a community, Thanks Bot will post them to the OpenStack wiki. They will also be featured in the OpenStack Developer Digest.

In developing this feature, I’vee had help and feedback from various members of the community. You can see my history of thanking people along the way, too.

At the next OpenStack event, you’re still welcome to buy a tasty beverage for someone to say thanks. But why not give them recognition now too and let them know how much they’re appreciated in the community?

Mike Perez is the cross-project developer coordinator at the OpenStack Foundation. You can find him as as thingee on IRC and Twitter.


The post For a happier open source community, give recognition appeared first on OpenStack Superuser.

by Mike Perez at February 16, 2018 09:53 PM

Keystone authentication for your Kubernetes cluster

Saverio Proto is a cloud engineer at SWITCH, a national research and education network in Switzerland, which runs a public cloud for national universities.

At SWITCH we’re looking to provide a container platform-as-a-service solution. To that end, we’re working on Kubernetes and OpenShift to gauge what’s possible and how a service could be structured. It would be really nice to use the existing OpenStack username and password to authenticate to Kubernetes. We tested this solution and it works great.

How does it work? Let’s start from the client side.

Kubernetes users use the kubectl client to access the cluster. The good news is that since version v1.8.0 of the client, kubectl is able to read the usual openstack env variables, contact Keystone to request a token and forward the request to the Kubernetes cluster using the token. This was merged August 7, 2017. However, I couldn’t find anywhere how to correctly configure the client to use this functionality so eventually I wrote some documentation notes here.

How does it work on the Kubernetes master side ?

The Kubernetes API receives a request with a Keystone token. In Kubernetes language, this is a Bearer Token. To verify the Keystone token the Kubernetes API server will use a WebHook. What does it mean? That the Kubernetes API will contact yet another Kubernetes component that’s capable of authenticating the Keystone token.

The k8s-keystone-auth component developed by Dims makes exactly this. I tested his code and I created a Docker container to integrate the k8s-keystone-auth in my kube-system namespace. When you run the k8s-keystone-auth container your pass as an argument the URL of your keystone server.

If you’re deploying your cluster with k8s-on-openstack, you can find this integration summarized in a single commit.

Now that everything’s set up, I can try:

source ~/openstackcredentials
kubectl get pods

I will be correctly authenticated by Keystone, but I'll have no authorization to do anything:

Error from server (Forbidden): pods is forbidden: User "saverio.proto@switch.ch" cannot list pods in the namespace "default"

This is because we need to set up some authorization for this Keystone user. You can find detailed documentation about role-based access control (RBAC), but here's a simple example:

kubectl create rolebinding saverio-view --clusterrole view --user saverio.proto@switch.ch --namespace default

Now my user is able to view anything in the default namespace and I'll be able to do kubectl get pods.

Of course, setting up RBAC specific rules for every user is not optimal. You can at least use the Keystone projects that are mapped to kind: Group in Kubernetes.

Here's an example:

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
 name: read-pods
 namespace: default
- kind: Group
 name: <openstack_project_uuid>
 apiGroup: rbac.authorization.k8s.io
 kind: Role
 name: pod-reader
 apiGroup: rbac.authorization.k8s.io

You can then achieve a “soft multi-tenancy” where every user belonging to a specific Keystone project has limited permissions to a specific namespace. I call it soft multi-tenancy because all the pods from all the namespaces, depending on your networking solution, could end up on the same network with a completely open policy.

I'd like to thank Dims and the other people on the Kubernetes Slack channel #sig-openstack for all their help while developing this deployment.


This post first appeared on the SWITCH blog. Superuser is always interested in community content, get in touch at editorATopenstack.org.

Cover Photo // CC BY NC

The post Keystone authentication for your Kubernetes cluster appeared first on OpenStack Superuser.

by Saverio Proto at February 16, 2018 02:40 PM


Mitaka Upgrade on Swift : SLO feature is now available.

Dear Customers,

Following Mitaka upgrade on object stores, we offer you now more flexibility for your actions on large objects.

Already live, the DLO feature allows you to handle large object using segments : your large object is split into many segments to be handled by Swift.

The feature SLO allows your to day to decide the way you want to make you division into segments.

Configuration is ensured through a manifesto in JSON.

The OpenStack community’s overview on this feature.

How it works ?

  • Using option --segment-size through Swift client
  • Using API directly :

Segments Upload

curl -i -X PUT --data-binary segment1 -H "X-Auth-Token: <token>" https://swift_url/container1/segment

curl -i -X PUT --data-binary segment2 -H "X-Auth-Token: <token>" https://swift_url/container2/segment

curl -i -X PUT --data-binary segment3 -H "X-Auth-Token: <token>" https://swift_url/container2/segment

Manifesto Upload

PUT request with ?multipart-manifest=put

Request body is the list of segment details. [{"path": "/container1/segment",

"etag": "<etagoftheobjectsegment>",

"size_bytes": 10485760}, ...]

You can get the etag information making an HEAD on the objet

Download object

Use of ?multipart-manifest=get in request

Delete de object

Use of ?multipart-manifest=delete in request

You can contact our support service support@cloudwatt.com

by Horia Merchi at February 16, 2018 12:00 AM

February 15, 2018

OpenStack Superuser

How open source sparks the business of network transformation

Software-defined networking and network function virtualization play an increasingly important role in transforming networks for the 5G and next generations of the network.

Behind these hot acronyms are a world of open-source contributors, raising the question of how open source mixes with business objectives.

That was the key question posed in a recent panel at Network Developer Day, organized by the Out of the Box Network Developers Meetup group in Santa Clara, California. The “cloud gurus” answering were Robert Starmer, founder and CTO of Kumulus Technologies, Google’s Lakshmi Sharma and Sanjit Dang, investment director at Intel Capital.

“The value and the knowledge that’s coming from open source is tremendous,” says Sharma, who says that five or six years ago, networking was “anything but cool” but that efforts like OpenDaylight, OPNFV and OpenStack have put it back in the spotlight. “That’s the reason so many companies have been able to make so much progress.”

She also cites the example of TensorFlow and the strides the open-source software library for data flow programming across a range of tasks is making in artificial intelligence for a number of companies. “Open source has enabled all of us to innovate at various levels, vertically, horizontally — basically any way you can imagine,” she concludes.

There’s a big difference however, Starmer says, between open source code and an an open source community.
“Once you have a community,  you actually have a project. You have people who are able to contribute and start to consume, to adjust the technology to fit real needs…and that drives that open technology into enterprise extensions.  Or maybe you don’t find that and it becomes more of a service organization,” he says.

And on the eternal question of the economics of free and open source development, Dang had a few core thoughts.

“Intel as a company has thrived thanks to adopting open source as well as driven the market for it,” says Dang.  “From an Intel Capital perspective, we invest in open source companies all the time. We do look for a critical mass of developers before we start get into it,” he adds, bringing up the examples of investments in Cloudera and Mirantis. “We look at what the contributions of the company bring to differentiating their business, such as security and other enterprise-ready features.”

Real deployments, challenges in SDN

Responding to an audience question about what happens in the trenches, Starmer says that many approach SDN as an end to itself  — networking companies trying to sell it to clients as “the answer, ” a position he describes as “backwards.”

The real issue, he says, is how you apply it. “If you’re a service provider, having software definable and software manageable networks — whether they be classic hardware with software extensions that make it easy to manage or you actually build overlay-modeled SDN networks — the need is the flexibility, the fungibility, the self-service capabilities  that network service provides. ” He says not everyone needs all of these capabilities.

“Deploying an SDN within an enterprise may not make sense —  just to say you have it, ‘I’ve deployed one and it’s wonderful, my network admins like it because it’s more  flexible’ – but it doesn’t necessarily provide any business value.” Service providers, he adds, have an obvious need for it but some enterprises may not. “If there’s business value that we can show, then it absolutely makes sense to go through the learning curve of SDN.”

Catch the whole 109-minute session below.

The post How open source sparks the business of network transformation appeared first on OpenStack Superuser.

by Superuser at February 15, 2018 05:14 PM

February 14, 2018

OpenStack Superuser

Insights from an OpenStack intern

The OpenStack Foundation is currently seeking a marketing intern — info on how to apply here.

I’ve worked with the OpenStack Foundation as intern for the last 12 months. It was a fantastic placement with an amazing team and I’m very blessed to have had the opportunity. Here are some insights from my time as an intern.

Take the deep dive

As soon as you start, give it your all. Read up on relevant information to help you understand the organization, work hard to meet deadlines, and even deliver tasks well before those due dates hit. Making that first great impression can create a great foundation for lasting working relationships later on. Not only that, it helps for further understanding from co-workers if something happens beyond your control and you can’t complete a particular task on time.

Say yes

Don’t be afraid to say yes! If a new project or opportunity presents itself and you have time for the task, give it a go! You never know what great things it might lead to. I was only in the first few weeks into my placement and a co-worker suggested I submit a CFP for the OpenStack Summit in Boston. I did, and I got to present my first ever talk at an international conference!

Say “Hi!”

Be proactive to meet all your colleagues, taking the chance to join both the formal meetings and social chats. This also extends to if you attend events like conferences or meetups. Take the chance to meet as many people as possible, even if you’re having informal chats about their career path, for example, what their role involves, how they got on their career path and so on. These are great ways to learn about different careers in the technology space. It is also a great chance to not only practice your networking skills (a very valuable one to have!) but also grow your network too.

Ask questions

Sometimes in new roles we feel like asking questions might make us look incompetent. That is completely false! If you’re unsure, this is the best thing to do. Your co-workers will appreciate your interest and it will help you get the task done quicker, instead of being stuck.

Make the pitch

Just because you’re an intern doesn’t mean your ideas won’t be taken seriously. If you have that lightbulb moment, make a pitch to your co-workers. Give them the two minute elevator pitch for idea and if they like it, make a formal written project proposal and take it from there. If you’re feeling nervous, remember the worst thing that can happen is that they say no. That’s it! Additionally, if they do say no, you can ask for feedback on the idea and what would make it better. I pitched the idea of an student engagement initiative called OpenStack comes to Campus. With support from colleagues we ran the first one in Melbourne, and now further documentation has been created for this to be replicated around the world.

Seek feedback and grow

Don’t wait for the formal feedback periods such as quarterly or mid year reviews for feedback. I used to ask on my weekly call with my manager if they were happy with my work and if there was anything I could improve on. You can fix any minor issues early and this also shows you want to continuously learn and grow.

Don’t be afraid of mistakes, embrace them and be prepared to learn from them.

It is inevitable that mistakes may be made at sometime during your internship. These things happen! As soon as they do, apologize and make amends. From there, reflect on what caused the issue and note how you can prevent it in the future.

Last but not least – Have fun!

I want to sincerely thank the OpenStack Foundation for an amazing year! I had a fantastic time and learned so much in my time with them. I would highly recommend them as a great work team to undertake placement with.

The OpenStack Foundation is currently seeking a marketing intern — info on how to apply here.

The post Insights from an OpenStack intern appeared first on OpenStack Superuser.

by Sonia Ramza at February 14, 2018 05:12 PM

February 13, 2018

OpenStack Superuser

Taking the OpenStack ops manuals to the next level

This article tackles the topic of unifying the OpenStack Operators community around a common goal of authoring, supporting and maintaining a rich set of documents that help operators in the field deploy and manage their OpenStack environments. Previous documents were static and maintained in a repository that was difficult to contribute to. Migrating them to the OpenStack Wiki allows them to become “living” documents.

The community around OpenStack is very broad, much like the various components that comprise OpenStack, both in terms of skills and regional locality. With such a diverse and dispersed project team, it becomes very difficult to pull them together to work for a common goal.

One very important and often-overlooked component of the OpenStack project, is its documentation, which serves as a tool for many, to help bridge this gap.

Today, the documentation is maintained in its own Git repository for anyone who wishes to checkout the source and build them, paired up with each OpenStack release. The process of building the OpenStack manuals from the repository requires some initial setup and dependencies to be installed on your build machine, but once you’ve done that, they should build cleanly without any issues.

Until very recently, those manuals were directly tied to the specific upstream releases of OpenStack. This meant that when an upstream release was marked EOL (“End of Life”) by the OpenStack project team, its repository tag was removed and the manuals would disappear with it. When the build process fell apart, you’d have to spend time patching or debugging the failures. This was the case with the previous EOL releases of the openstack-manuals, which relied on deprecated tags and legacy dependencies that caused build failures.

Removing the manuals for a release that has gone EOL leaves a gap between the versions of OpenStack that are currently running out there in production, and the availability of the documentation needed to install and operate it. Typically production deployments will lag behind development by a point release or two. This means that if you were running a release of OpenStack two versions behind the latest supported version, finding the latest version of documentation for that older release, was nearly impossible, unless you knew where to look, or were equipped to build the documentation yourself.

To date, many companies providing support for OpenStack, including Canonical, have provided long-term support (“LTS”) of OpenStack running on their supported Linux distributions, if you were running supported release of both components. The burden of providing the “missing” documentation fell upon each distribution and packager to maintain and provide for their customers and clients.

With some effort and discussion with the OpenStack Documentation project team, that gap was closed when the project team agreed to own and maintain the EOL releases of their documentation (the “openstack-manuals”) permanently, starting with the Newton release of OpenStack. The result is that these manuals will continue to be available online, searchable well after Newton and future releases have gone EOL. For more on how this was resolved, checkout a previous article posted to Superuser.

This effort directly helps the operators out in the field who are running these OpenStack versions in production today and in the future. But those manuals will remain static, unchanged, once their last commit has been made. Only security fixes to the underlying Javascript and CSS that makes them functional in a browser will be made.

There is a second repository of material specific to the operators of OpenStack called the “Ops Guide,” and it too was held in its own repository alongside the OpenStack manuals (“ops-guide“). Because there were very few to no contributions made to that guide, it was frozen along with the Newton release of OpenStack.

I’ve taken that body of work, built it using the existing tools within the openstack-manuals repository (Sphinx, extensions and other rendering and transformation tools), converted the rendered HTML those tools produce into Markdown and then converted that Markdown into Wikitext intended for import into the upstream OpenStack wiki, currently powered by Mediawiki.

This wiki is intended to be a live, editable resource for the OpenStack operators out in the field who are running current or older versions of OpenStack, to maintain, update and add resources and content to, to make their collective jobs easier running OpenStack beyond what the “openstack-manuals” tied to each release can provide.

The process of building from the Python RST format documentation in the openstack-manual repository to the final output in Wikitext required “building” the manuals to produce the “ops-guide” rendered HTML output. From there, I took those HTML pages and converted them into Markdown using a tool called “pandoc“. There were a few passes of pandoc to massage the content from HTML -> Markdown -> Wikitext. That process looks something like this:

pandoc -f html+lhs -t markdown-raw_html-native_divs-native_spans -o “${fname%.html}”.md “$fname”

pandoc -f markdown+lhs -w mediawiki “$line” -t mediawiki -o “${fname%.md}”.wiki

You might ask, why do we have to go to Markdown first and then to Wikitext?

There are several subtle, legacy HTML constructs that don’t directly convert well from HTML into native Wikitext, if at all. Through testing I’ve found that using a secondary transformation format as a neutral control gives us the best rendering output for the final version; Wikitext. This includes spans, divs, tables and some other elements. The work to roll those back into the upstream rendered HTML isn’t considered high priority, as the wiki will diverge quickly with changes and edits once it has been made live and released to the operators.

Once the conversion was done, each page was then imported into the OpenStack Operator’s wiki site, and visually adjusted or corrected to provide the best view for visitors and operators/editors. Some of these changes included altering subtle coloring or table formatting, but mostly it was ensuring that the content, sections and headings rendered correctly when the Mediawiki filters and post-processing ran over each page.

The OpenStack wiki content under the OpsGuide section now includes the full, complete, rendered copy of the “ops-guide,” ready for the real-world operators to own, manage, update and maintain going forward.

It’s our hope that the wiki begins to take on a life of its own, and grows organically like the rest of the OpenStack ecosystem.

Author David A. Desrosiers works at Canonical, US Support.

Superuser is always interested in community content – get in touch at editorATopenstack.org

The post Taking the OpenStack ops manuals to the next level appeared first on OpenStack Superuser.

by David Desrosiers at February 13, 2018 05:02 PM

Chris Dent

TC Report 18-07

A few things to report from the past week of TC interaction. Not much in the way of opportunities to opine or editorialize.


Still more on the topic of OpenStack wide goals. There was some robust discussion about the mox goal (eventually leading to accepting the goal, despite some reservations). That discussion somehow meandered into where gerrit is on the "least bad" to "most good" scale.

PostgreSQL and Triggers

Later in the same day there was some discussion about the state of PostgreSQL support and the merit of using triggers to manage migrations. There were some pretty strong (and supported) assertions that triggers are not a good choice.

PowerStackers and Driver Projects

Discussion about having a PowerStackers project evolved into a review of the current thinking on dealing with projects that require access to special drivers or hardware. This is a common discussion in OpenStack because so much of the point of OpenStack is to provide an abstraction over stuff.

Prepping the PTG

There's a growing etherpad of topics to be discussed with the TC Friday morning at the PTG. You should feel free to add topics and show up and keep the TC in check. The room the meeting will be in is glorious in its criminal-mastermind-wonderfulness.

by Chris Dent at February 13, 2018 03:15 PM

February 12, 2018

Chris Dent

Placement Container Playground 3

Continued explorations of winnowing the size of the OpenStack placement service by attempting to remove requirements and adjust code imports, all within a container for easy isolation. The previous discussion is at Placement Container Playground 2.

Today I decided I wanted to run the setup from Placement Scale Fun against the container experiments, so that it was easy to add more placement services on different hosts.

But first, based on the earlier exploration, I integrated three more changes into the container setup:

  • Using nova.db.api directly. This makes it so that nova/db/__init__.py has no code and thus does not inadvertently load in more modules.
  • Moving DB constants to own file. This was needed so that the inventory handler for placement (which uses some of the constants) doesn't have to load other stuff.
  • Isolate configuration loading. This has placement using its own code to load and process configuration so that it doesn't have to import RPC and DB related modules that it doesn't use.

With those changes integrated (the Dockerfile, which makes no claim to being proper, can be found in the placedock repo), along with the changes listed in the previous post, the size of an individual placement UWSGI process shrinks. In my environment after running under load for a while:

  • Master: VSZ: 273924, RSS 106216
  • Lightweight: VSZ 180840, RSS 74452

This is nice, but I suspect there's more room to be saved if we want.

With that in place, I set up 4 containers running placement. Two on the control plane host, and two on the compute host. The apache configuration for proxying to multiple UWSGI services looks like this (assuming the generic uwsgi stuff is set up elsewhere, which devstack does for you):

<Proxy balancer://placement>
    BalancerMember uwsgi://
    BalancerMember uwsgi://
    BalancerMember uwsgi://
    BalancerMember uwsgi://

ProxyPass "/placement" balancer://placement

This works well when creating a bunch of instances as described in the scale fun. Requests were nicely balanced around the 4 containers, each of which kept a nice stable size and didn't sweat.

There are a few remaining warts being loaded into the service that it would be nice to avoid:

  • oslo_service is loaded for the sake of setting some log_options. This means eventlet and greenlet are in the mix. These are not needed by placement.
  • castellan is imported by nova/conf/key_manager.py. That is imported as a result of wanting to use nova.conf. This results in a mess of cryptography related modules and libraries being required.

It's not immediately obvious if there are workarounds for these issues. At least not prior to placement having its own configuration file and setup.

by Chris Dent at February 12, 2018 10:30 PM

Ed Leafe

Modeling Affinity in Placement

The ‘cloud’ in cloud computing derives from the amorphous location of your resources. Sure, you have a virtual server, but all you really know is that it’s somewhere on your cloud provider’s hardware, and you can reach it at a certain IP address. You generally don’t care about its exact location. There are times, though, … Continue reading "Modeling Affinity in Placement"

by ed at February 12, 2018 08:25 PM

OpenStack Superuser

Superuser Awards nominations open for OpenStack Summit

Nominations for the OpenStack Summit Vancouver Superuser Awards are open and will be accepted through midnight (Pacific Daylight Time) April 6.

All nominees will be reviewed by the community and the Superuser editorial advisors will determine the winner that will be announced onstage at the Summit in May.

The Superuser Awards recognize teams using OpenStack to meaningfully improve business and differentiate in a competitive industry, while also contributing back to the community.

Teams of all sizes are encouraged to apply. If you fit the bill, or know a team that does, we encourage you to submit a nomination here.

Each team should submit the application in the appropriate category. After the community has a chance to review all nominees, the Superuser editorial advisors will narrow it down to four finalists and select the winner.

Launched at the Paris Summit in 2014, the community has continued to award winners at every Summit to users who show how OpenStack is making a difference and provide strategic value in their organization. Past winners include CERN, Comcast, NTT GroupAT&T and Tencent TStack.

The OpenStack community will have the chance to review the list of nominees, how they are running OpenStack, what open source technologies they are using and the ways they are contributing back to the OpenStack community.

Then, the Superuser editorial advisors will review the submissions, narrow the nominees down to four finalists and review the finalists to determine the winner based on the submissions.

When evaluating winners for the Superuser Award, judges take into account the unique nature of use case(s), as well as integrations and applications of OpenStack performed by a particular team.

Additional selection criteria includes how the workload has transformed the company’s business, including quantitative and qualitative results of performance as well as community impact in terms of code contributions, feedback, knowledge sharing and the number of Certified OpenStack Administrators (COAs) on staff.

Winners will take the stage at the OpenStack Summit in Vancouver. Submissions are open now until April 6, 2018. You’re invited to nominate your team or nominate a Superuser here.

For more information about the Superuser Awards, please visit http://superuser.openstack.org/awards.

The post Superuser Awards nominations open for OpenStack Summit appeared first on OpenStack Superuser.

by Superuser at February 12, 2018 03:30 PM

February 11, 2018


What is Hyperconverged Infrastructure (HCI) and when should I use it? Pros and cons of HCI and traditional architecture

There are good reasons to use Hyperconverged Infrastructure, but there are also downsides you need to take into consideration.

by Christian Huebner at February 11, 2018 01:07 AM

February 10, 2018

Adam Young

Deleting an image on RDO

So I uploaded a qcow image…but did it wrong. It was tagged as raw instead of qcow, and now I want it gone. Only problem….it is stuck.

$ openstack image delete rhel-server-7.4-update-4-x86_64
Failed to delete image with name or ID 'rhel-server-7.4-update-4-x86_64': 409 Conflict
Image 2e77971e-7746-4992-8e1e-7ce1be8528f8 could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance.

But….I deleted all of the instances connected to it! Come On!

Answer is easy once the code-rage wears off…

When I created a server based on this image, it created a new volume. That volume is locking the image into place.

$ openstack volume list
| ID                                   | Name | Status    | Size | Attached to                      |
| 97a15e9c-2744-4f31-95f3-a13603e49b6d |      | error     |    1 |                                  |
| c9337612-8317-425f-b313-f8ba9336f1cc |      | available |    1 |                                  |
| 9560a18f-bfeb-4964-9785-6e76fa720892 |      | in-use    |    9 | Attached to showoff on /dev/vda  |
| 0188edd7-7e91-4a80-a764-50d47bba9978 |      | in-use    |    9 | Attached to test1 on /dev/vda    |

See that error? I think its that one. I can’t confirm now, as I also deleted the available one, as I didn’t need it, either.

$ openstack volume delete 97a15e9c-2744-4f31-95f3-a13603e49b6d
$ openstack volume delete c9337612-8317-425f-b313-f8ba9336f1cc
$ openstack image delete rhel-server-7.4-update-4-x86_64

And that last command succeeded.

$ openstack image show  rhel-server-7.4-update-4-x86_64
Could not find resource rhel-server-7.4-update-4-x86_64

by Adam Young at February 10, 2018 12:06 AM

February 09, 2018

Adam Young

Keystonerc for RDO cloud

If you are using RDO Cloud and want to do command line Ops, here is the outline of a keystone.rc file you can use to get started.

unset $( set | awk '{FS="="} /^OS_/ {print $1}' )

export OS_AUTH_URL=https://phx2.cloud.rdoproject.org:35357/v3/
export OS_USERNAME={username}
export OS_PASSWORD={password}
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_NAME={projectname}

You might have been given a different AUTH URL to use. The important parts are appending the /v3/ and explicitly setting the OS_IDENTITY_API_VERSION=3. Setting both is overkill, but you can never have too much over kill.

Once you have this set, source it, and you can run:

$ openstack image list
| ID                                   | Name                                      | Status |
| af47a290-3af3-4e46-bb56-4f250a3c20a4 | CentOS-6-x86_64-GenericCloud-1706         | active |
| b5446129-8c75-4ce7-84a3-83756e5f1236 | CentOS-7-x86_64-GenericCloud-1701         | active |
| 8f41e8ce-cacc-4354-a481-9b9dba4f6de7 | CentOS-7-x86_64-GenericCloud-1703         | active |
| 42a43956-a445-47e5-89d0-593b9c7b07d0 | CentOS-7-x86_64-GenericCloud-1706         | active |
| ffff3320-1bf8-4a9a-a26d-5abd639a6e33 | CentOS-7-x86_64-GenericCloud-1708         | active |
| 28b76dd3-4017-4b46-8dc9-98ef1cb4034f | CentOS-7-x86_64-GenericCloud-1801-01      | active |
| 2e596086-38c9-41d1-b1bd-bcf6c3ddbdef | CentOS-Atomic-Host-7.1706-GenericCloud    | active |
| 1dfd12d7-6f3a-46a6-ac69-03cf870cd7be | CentOS-Atomic-Host-7.1708-GenericCloud    | active |
| 31e9cf36-ba64-4b27-b5fc-941a94703767 | CentOS-Atomic-Host-7.1801-02-GenericCloud | active |
| c59224e2-c5df-4a86-b7b6-49556d8c7f5c | bmc-base                                  | active |
| 5dede8d3-a723-4744-97df-0e6ca93f5460 | ipxe-boot                                 | active |

by Adam Young at February 09, 2018 10:17 PM

Steve Hardy

Debugging TripleO revisited - Heat, Ansible & Puppet

Some time ago I wrote a post about debugging TripleO heat templates, which contained some details of possible debug workflows when TripleO deployments fail.

In recent releases (since the Pike release) we've made some major changes to the TripleO architecture - we makes more use of Ansible "under the hood", and we now support deploying containerized environments.  I described some of these architectural changes in a talk at the recent OpenStack Summit in Sydney.

In this post I'd like to provide a refreshed tutorial on typical debug workflow, primarily focussing on the configuration phase of a typical TripleO deployment, and with particular focus on interfaces which have changed or are new since my original debugging post.

We'll start by looking at the deploy workflow as a whole, some heat interfaces for diagnosing the nature of the failure, then we'll at how to debug directly via Ansible and Puppet.  In a future post I'll also cover the basics of debugging containerized deployments.

The TripleO deploy workflow, overview

A typical TripleO deployment consists of several discrete phases, which are run in order:

Provisioning of the nodes

  1. A "plan" is created (heat templates and other files are uploaded to Swift running on the undercloud
  2. Some validation checks are performed by Mistral/Heat then a Heat stack create is started (by Mistral on the undercloud)
  3. Heat creates some groups of nodes (one group per TripleO role e.g "Controller"), which results in API calls to Nova
  4. Nova makes scheduling/placement decisions based on your flavors (which can be different per role), and calls Ironic to provision the baremetal nodes
  5. The nodes are provisioned by Ironic

This first phase is the provisioning workflow, after that is complete and the nodes are reported ACTIVE by nova (e.g the nodes are provisioned with an OS and running).

Host preparation

The next step is to configure the nodes in preparation for starting the services, which again has a specific workflow (some optional steps are omitted for clarity):

  1. The node networking is configured, via the os-net-config tool
  2. We write hieradata for puppet to the node filesystem (under /etc/puppet/hieradata/*)
  3. We write some data files to the node filesystem (a puppet manifest for baremetal configuration, and some json files that are used for container configuration)

Service deployment, step-by-step configuration

The final step is to deploy the services, either on the baremetal host or in containers, this consists of several tasks run in a specific order:

  1. We run puppet on the baremetal host (even in the containerized architecture this is still needed, e.g to configure the docker daemon and a few other things)
  2. We run "docker-puppet.py" to generate the configuration files for each enabled service (this only happens once, on step 1, for all services)
  3. We start any containers enabled for this step via the "paunch" tool, which translates some json files into running docker containers, and optionally does some bootstrapping tasks.
  4. We run docker-puppet.py again (with a different configuration, only on one node the "bootstrap host"), this does some bootstrap tasks that are performed via puppet, such as creating keystone users and endpoints after starting the service.

Note that these steps are performed repeatedly with an incrementing step value (e.g step 1, 2, 3, 4, and 5), with the exception of the "docker-puppet.py" config generation which we only need to do once (we just generate the configs for all services regardless of which step they get started in).

Below is a diagram which illustrates this step-by-step deployment workflow:
TripleO Service configuration workflow

The most common deployment failures occur during this service configuration phase of deployment, so the remainder of this post will primarily focus on debugging failures of the deployment steps.


Debugging first steps - what failed?

Heat Stack create failed.

Ok something failed during your TripleO deployment, it happens to all of us sometimes!  The next step is to understand the root-cause.

My starting point after this is always to run:

openstack stack failures list --long <stackname>

(undercloud) [stack@undercloud ~]$ openstack stack failures list --long overcloud
resource_type: OS::Heat::StructuredDeployment
physical_resource_id: 421c7860-dd7d-47bd-9e12-de0008a4c106
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |

PLAY [localhost] ***************************************************************


TASK [Run puppet host configuration for step 1] ********************************
ok: [localhost]

TASK [debug] *******************************************************************
fatal: [localhost]: FAILED! => {
"changed": false,
"failed_when_result": true,
"outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
"Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8",
"Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8dd0b23a-acb8-4e11-aef7-12ea1d4cf038_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=18 changed=12 unreachable=0 failed=1

We can tell several things from the output (which has been edited above for brevity), firstly the name of the failing resource

  • The error was on one of the Controllers (ControllerDeployment)
  • The deployment failed during the per-step service configuration phase (the AllNodesDeploySteps part tells us this)
  • The failure was during the first step (Step1.0)
Then we see more clues in the deploy_stdout, ansible failed running the task which runs puppet on the host, it looks like a problem with the puppet code.

With a little more digging we can see which node exactly this failure relates to, e.g we copy the SoftwareDeployment ID from the output above, then run:

(undercloud) [stack@undercloud ~]$ openstack software deployment show 421c7860-dd7d-47bd-9e12-de0008a4c106 --format value --column server_id
(undercloud) [stack@undercloud ~]$ openstack server list | grep 29b3c254-5270-42ae-8150-9fc3f67d3d89
| 29b3c254-5270-42ae-8150-9fc3f67d3d89 | overcloud-controller-0 | ACTIVE | ctlplane= | overcloud-full | oooq_control |

Ok so puppet failed while running via ansible on overcloud-controller-0.


Debugging via Ansible directly

Having identified that the problem was during the ansible-driven configuration phase, one option is to re-run the same configuration directly via ansible-ansible playbook, so you can either increase verbosity or potentially modify the tasks to debug the problem.

Since the Queens release, this is actually very easy, using a combination of the new "openstack overcloud config download" command and the tripleo dynamic ansible inventory.

(undercloud) [stack@undercloud ~]$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud ~]$ cd /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ls
common_deploy_steps_tasks.yaml external_post_deploy_steps_tasks.yaml templates
Compute global_vars.yaml update_steps_playbook.yaml
Controller group_vars update_steps_tasks.yaml
deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml upgrade_steps_playbook.yaml
external_deploy_steps_tasks.yaml post_upgrade_steps_tasks.yaml upgrade_steps_tasks.yaml

Here we can see there is a "deploy_steps_playbook.yaml", which is the entry point to run the ansible service configuration steps.  This runs all the common deployment tasks (as outlined above) as well as any service specific tasks (these end up in task include files in the per-role directories, e.g Controller and Compute in this example).

We can run the playbook again on all nodes with the tripleo-ansible-inventory from tripleo-validations, which is installed by default on the undercloud:

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml --limit overcloud-controller-0
TASK [Run puppet host configuration for step 1] ********************************************************************
ok: []

TASK [debug] *******************************************************************************************************
fatal: []: FAILED! => {
"changed": false,
"failed_when_result": true,
"outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
"Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend",
"exception: connect failed",
"Warning: Undefined variable '::deploy_config_name'; ",
" (file & line not available)",
"Warning: Undefined variable 'deploy_config_name'; ",
"Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile
/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"


NO MORE HOSTS LEFT *************************************************************************************************
to retry, use: --limit @/home/stack/tripleo-VOVet0-config/deploy_steps_playbook.retry

PLAY RECAP ********************************************************************************************************* : ok=56 changed=2 unreachable=0 failed=1

Here we can see the same error is reproduced directly via ansible, and we made use of the --limit option to only run tasks on the overcloud-controller-0 node.  We could also have added --tags to limit the tasks further (see tripleo-heat-templates for which tags are supported).

If the error were ansible related, this would be a good way to debug and test any potential fixes to the ansible tasks, and in the upcoming Rocky release there are plans to switch to this model of deployment by default.


Debugging via Puppet directly

Since this error seems to be puppet related, the next step is to reproduce it on the host (obviously the steps above often yield enough information to identify the puppet error, but this assumes you need to do more detailed debugging directly via puppet):

Firstly we log on to the node, and look at the files in the /var/lib/tripleo-config directory.

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ssh heat-admin@
Warning: Permanently added '' (ECDSA) to the list of known hosts.
Last login: Fri Feb 9 14:30:02 2018 from gateway
[heat-admin@overcloud-controller-0 ~]$ cd /var/lib/tripleo-config/
[heat-admin@overcloud-controller-0 tripleo-config]$ ls
docker-container-startup-config-step_1.json docker-container-startup-config-step_4.json puppet_step_config.pp
docker-container-startup-config-step_2.json docker-container-startup-config-step_5.json
docker-container-startup-config-step_3.json docker-container-startup-config-step_6.json

The puppet_step_config.pp file is the manifest applied by ansible on the baremetal host

We can debug any puppet host configuration by running puppet apply manually. Note that hiera is used to control the step value, this will be at the same value as the failing step, but it can also be useful sometimes to manually modify this for development testing of different steps for a particular service.

[root@overcloud-controller-0 tripleo-config]# hiera -c /etc/puppet/hiera.yaml step
[root@overcloud-controller-0 tripleo-config]# cat /etc/puppet/hieradata/config_step.json
{"step": 1}[root@overcloud-controller-0 tripleo-config]# puppet apply --debug puppet_step_config.pp
Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain

Here we can see the problem is a typo in the /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp file at line 181, I look at the file, fix the problem (ugeas should be augeas) then re-run puppet apply to confirm the fix.

Note that with puppet module fixes you will need to get the fix either into an updated overcloud image, or update the module via deploy artifacts for testing local forks of the modules.

That's all for today, but in a future post, I will cover the new container architecture, and share some debugging approaches I have found helpful when deployment failures are container related.

by Steve Hardy (noreply@blogger.com) at February 09, 2018 05:04 PM

OpenStack Superuser

Intro to networking-vector packet processing

Vector packet processing (VPP) is a fast network data plane, part of the Linux Foundation FD.io project. It’s designed on top of the data plane development kit (DPDK) to run network workloads at breakneck speeds using vector packet processing technology, or what Cisco’s Jérôme Tollet calls its “secret sauce.”

Tollet offered an introduction — and took the risk of setting up a live container demo — at his recent talk at the Free and Open Source Software Developers’ European Meeting, aka FOSDEM.

Networking-VPP is a VPP network driver for Neutron designed to take advantage of VPP’s performance. VPP brings the performance; Networking-VPP brings the resiliency, simplicity, security and scalability required to make it useful in OpenStack environments.

Tollet talks about what VPP is and why it’s a good choice for network workloads, particularly demanding ones such as network functions virtualization (NFV) applications; how the ML2 driver is put together and why you can trust it to hold up in cloud environments (and even debug it); and the performance results when the two are put together in a real-world OpenStack deployment.

Exactly how fast are we talking? Thanks to the “cache effect” of VPP, Tollet showed a graph with speeds with IpV4 and IPV6 routing showing huge linear upticks — on the order of 12 million packets per second on two IPV6 cores. “That is pretty cool,” he pronounced, heading into the demo, which you can catch at 13:20 mark of the video.

The demo shows two “regular Linux” containers in a VM, each with an IPV4 address, then he connects them with an VPP address and then runs an IPF client and an IPF server between them. They reach a speed of 36-37 gigabits per second, which is “pretty cool, but what’s actually important is the vector size,” he adds. The vector size reached in the demo? 6.90 for the tab zero output and the TX as well.

All of that is useless, he says, if it’s not integrated somewhere — which is where OpenStack comes in as an ML2 driver.  Designed to support NFV in the beginning,  it now supports virtual LAN, virtual extensible LAN, “many security features” (including regular OpenStack security groups plus features like JSON web-token certificates), as well as layer 2 and layer 3.

Catch his entire 26-minute presentation on YouTube or below.

Cover Photo // CC BY NC

The post Intro to networking-vector packet processing appeared first on OpenStack Superuser.

by Superuser at February 09, 2018 03:38 PM


Why an involved user community makes for better software

Healthy interaction between the open source development and user community should be a high priority.

by kpfleming at February 09, 2018 08:00 AM

February 08, 2018

Chris Dent

Placement Scale Fun

Some notes on exploring how an OpenStack placement service behaves at scale.

The initial challenge is setting up a useful environment. To exercise placement well we need either or both of lots of instances and lots of resource providers (in the form of compute nodes where those instances can land). In the absence of unlimited hardware this needs to be faked in some fashion.

Thankfully, devstack provides ways to make use of the fake virt driver to boot fake instances that don't consume much in the way of resources (but follow the true API during the spawn process), and to create multiple nova-compute processes on the same host to manage those fake instances.

The process of figuring out how to make this go was a combination of grep, talking to people, and trying and failing multiple times. This summary is much tidier than the "omg, I have no idea what I'm doing" process of fail and fail again that led to it.

Also note that I'm not doing formal benchmarking here. Rather I'm doing human observation of where things go wrong, what variables are involved and how things feel. This is an important precursor to real benchmarking to have a clue how the system works. The set up I'm using would not be ideal for benchmarking, for example, as the VMs I'm using are on the same physical host (in this case a dual xeon es-2620, 32 GB server running esxi) meaning they impact each other (especially given the way I've configured the VMs), and aren't subject to physical networking.

Another thing to note is that while a lot of this experimentation could be automated, not doing so gives me deeper insight into how things work, exposes bugs that need to be fixed, and has all the usual benefits gained from doing things "the hard way". For formal testing (where repeating things is paramount) all this faffing about by humans would not be good. But for this, it is.

I eventually landed on the following set up with two VMs, one as the control plane (ds1), one as the compute host (cn1).

  • ds1 is a 16 core, 16GB VM. It's hosting control plane services and mysql and rabbitmq. This is where the scheduler and placement run.
  • cn1 is a 10 core, 11GB VM and is running 75 nova-compute process, the metadata server, and neutron agent.
  • To limit message bus traffic, notifications are configured to only send unversioned rather than the default of both. There's currently no easy way to disable notifications entirely.
  • The "Noop" quota driver is used because we don't want to care about quotas in this case.
  • The filter scheduler is used, but all filters are turned off.

These last two tricks were learned from some devstack experiments by Matt Riedemann.

Both VMs are Ubuntu Artful, both are using master for all the OpenStack services, except for devstack itself, which needs this fix (to a bug caused by me).

The devstack configurations are relatively straightforward, the important pieces are:

  • Setting the virt driver: VIRT_DRIVER=fake
  • Telling devstack how many fake compute nodes we want: NUMBER_FAKE_NOVA_COMPUTE=75. This will create multiple compute nodes each of which uses a common config file, plus a config file unique to the process that sets the host name of the nova-compute process (required to get unique resource providers).
  • Manipulating the nova.conf with a [[post-config|$NOVA_CONF]] section to set a few things.

The local.conf for cn1 (the compute host) is:


driver = "nova.quota.NoopQuotaDriver"
enabled_filters = '""'
notification_format = unversioned

I'm using static IPs because it makes things easier. If you are trying to repeat this in your own environment your HOST_IP and SERVICE_HOST will likely be different. Everything else ought to be the same. Explicitly setting ENABLED_SERVICES ensures that only the stuff you really need is running. See Multi-Node Lab for some more information on multi-node devstack (Note that there is a lot in there you don't need to care about if you aren't actually going to use the VMs that you create in the deployment).

The local.conf for the control plane (ds1) mostly uses defaults but disables some services that we don't care about, and adjusts the nova config as required:


disable_service horizon
disable_service dstat
disable_service tempest
disable_service n-cpu
disable_service q-agt
disable_service n-api-meta

driver = "nova.quota.NoopQuotaDriver"
enabled_filters = '""'
notification_format = unversioned

Note that we are disabling the services that will be running on the compute host.

There are redundancies between these two files. Some of the stuff required by one is in the other. This is because I started out with nova-compute on both hosts and haven't fully rationalized the local.conf files.

Now that we know what we're building we can build it. The control plane (ds1) needs to be in place first so build devstack there first:

cd wherever_devstack_is

and wait. When it completes do the same on the compute host (cn1).

When that is done, the control host needs to be made aware of the compute host, after which you can verify the presence of the 75 hypervisors:

. openrc admin admin
nova-manage cell_v2 discover_hosts
openstack hypervisor list

Playing With It

Once all that is done it is possible to send a few different workload patterns at the service. It's hard to do this in a way that isolates any particular service as they all interact so much.

In my first round of experiments, yesterday, I tried a few different scenarios to get a sense of how things worked and what variables exist.

When booting a large number of servers from a small number of nova boot commands with a high min-count (e.g., 1000) the placement api processes are lost as noise in the face of the much greater effort being made by nova-conductor.

It is only when a larger number of smaller requests (15 concurrent requests for 50 instances each) are made that the placement API begins to show any signs that it is working hard. This is about what you would expect: talking to /allocation_candidates is certainly where most effort happens and most data is processed.

Today I decided to narrow things down to making lots of parallel boots of single instances, to impact the placement service as much as possible.

If you intend to start many nova boot (or openstack server) commands at the same time, make sure you do them from a third machine. I tried to do 300 nova boot commands, and pushed my load average over 400 and brought the world to a complete stop.

In the current devstack (February 2018) we can use built in flavor and image references when making a boot request. In addition, since we are making fakes we can set the nic to none. This boots one server named foobar using the m1.tiny flavor:

nova boot --flavor 1 --nic none --image cirros-0.3.5-x86_64-disk foobar

We can boot 1000 of those with:

nova boot --flavor 1 --nic none --image cirros-0.3.5-x86_64-disk -min-count 1000 foobar

Each instance will get a numeric suffix. As stated above this doesn't stress placement much.

If we do want to stress placement we need to increase the number of concurrent requests to GET /allocation_candidates, at which point the number of instances per boot request is less of an issue. One way to do this is to background a mess of boot commands:

for i in {1..100}; do \
nova boot --flavor 1 --nic none --image cirros-0.3.5-x86_64-disk ${i}-foobar &

But more often than not this will cause the calls to the nova-scheduler process to timeout when the conductor tries to call select_destinations. We can work around this by hacking nova-scheduler to have more workers. Since this is something that requires a hack presumably there's a reason for it.

diff --git a/nova/cmd/scheduler.py b/nova/cmd/scheduler.py
index 51d5aee4ac..d794eacaf3 100644
--- a/nova/cmd/scheduler.py
+++ b/nova/cmd/scheduler.py
@@ -45,5 +45,5 @@ def main():

      server = service.Service.create(binary='nova-scheduler',
-    service.serve(server)
+    service.serve(server, workers=4)

Running four nova-scheduler workers, the above nova boot command works fine with no timeout. However, code to do this was never merged for reasons (which may or may not still be valid with the existence of placement) discussed on the review and in a related email.

Then I tried:

for i in {1..500}; do \
nova boot --flavor 1 --nic none --image cirros-0.3.5-x86_64-disk ${i}-foobar & \

500 parallel boots. This caused the Apache process (which provides a front end to keystone, glance, the compute api, and placement) to freeze up and need MaxRequestWorkers raised. Apache (in a default configuration) is a pretty weak link in this stuff. It's easy to see why people prefer nginx in situations where all the web server is really doing is being a reverse proxy.

Once Apache is sorted, then it is my (non-VM) machine doing the nova boots that suffers. It seems that 500 nova boot that are doing actual work instead of just timing out trying to contact a stuck web server is not a happy way to be. 15 minutes later it woke up and boots started. shrug.

At which point select_destinations started timing out again. 4 workers not enough? I can (and did) raise it to eight but it doesn't change the fact that my 500 parallel nova boot commands get stuck if run from one machine, and at the moment I've run out of free hardware.

So instead I've spread the load a bit:

for j in {1..10} ; do \
  for i in {1..50}; do \
    nova boot --flavor 1 --nic none --image cirros-0.3.5-x86_64-disk ${i}-foobar & \
  done ; \
  sleep 60 ; \

After this I get 500 ACTIVE instances in fairly short order. The processes which seem to get the most work are cell1 conductor, interleaved with the nova-scheduler.

At this stage it makes sense to check that the placement database has expected data:

  • 1500 allocations: correct. (3 for each instance)
  • 75 resource providers: correct.

Then it is time to delete all those servers:

openstack server list -f value -c ID | xargs openstack server delete

While that is happening it is again the cell conductor that sweats.

0 allocations in the placement db when that's done. ✔

Random Observations

Some thoughts that didn't quit fit in anywhere else:

  • We know this already, but an idle compute-manager is fairly chatty with the placement service. If you have 75 of them, that chat starts to add up: Approximately 246 requests per minute, checking on the state of inventory and allocations. Work is already in progress to investigate this, but it should be noted that the placement service handles this traffic with aplomb. In fact at no point during the entire exercise did the placement service sweat.
  • It makes sense that if you're going to have 8 conductors you want at least 8 schedulers?
  • This stuff simply won't work without multiple scheduler workers. If the rpc timeout limit is raised that can make things work but only very slowly. This suggests that it is important for us to a) make sure that multiple workers is safe, b) change the code (as the diff above) so that workers can happen, c) recommend doing it.
  • The placement UWSGI processes appear fairly stable memory-wise.
  • It's important to note that no traits, custom resource classes, nested resource providers, aggregates or shared resource providers are used here. Having any of those in the loop could impact the profile. We don't yet know.
  • The control plane host is working at full tilt through all of this. The compute host not much at all (because it is fake). This suggests that distributing the control plane services broadly is important. I will probably try to integrate these experiments with my placement container experiments, putting those containers on a different host. It looks like having the cell conductor elsewhere would be interesting to observe as well.
  • Doing this kind of thing is a huge learning experience and a valuable use of time (despite taking a lot of time). I wish I could remember that more often.

by Chris Dent at February 08, 2018 05:15 PM

OpenStack Superuser

Making cities smarter with internet of things

Back in 2014, researchers at the University of Messina in Sicily put their heads together to make their city smarter. They started brainstorming — and then successfully crowdfunding — projects that could improve traffic, optimize energy consumption and prevent crime in a metropolitan area of about 650,000 people.

They founded SmartMe.IO, an academic spin-off born out of that research team from the Mobile and Distributed Systems Lab (MDSLab) at the University of Messina, to create vendor-neutral, open source solutions “on a shoestring budget,” according to a recent feature in “Linux Magazine.”

They developed a framework for internet-of-things that they dubbed Stack4Things to collect information from the devices. Stack4Things collects data in real-time and using network virtualization and the framework provides infrastructure-as-a-service-like services from a pool of IoT devices.

To morph Messina into a smart city, an open data platform has been set up through the employment of low-cost microcontroller boards equipped with sensors and actuators and installed on buses, lamp posts, and buildings of local institutions, all over the metro area.

That’s where OpenStack comes in, says Cristiano Bellucci who works for partner Fugitsu. Monasca, the OpenStack monitoring service, analyzes and presents the measured data as well as watching over the health of IoT nodes, too. Stack4Things and Monasca together help to collect, analyze and present the information to users who can decide the best course of action to improve the environment of the city.

So far, the team has devices for urban mobility, e-Tourism, environmental monitoring and device fleet management in place. “The weather is warm, the food is great, the people are friendly, if I can offer some advice for your next holiday, try Messina,” Bellucci says in a talk at the OpenStack Summit Boston. “And if you look around, you’ll see devices like this one all over.”

Next up? The team will be able to offer “complete end-to-end solutions so that future projects don’t have to worry about the infrastructure” and can consume the data on their own.

Catch the full story in Linux Magazine or the video from the Summit Boston session below.

Cover Photo // CC BY NC

The post Making cities smarter with internet of things appeared first on OpenStack Superuser.

by Superuser at February 08, 2018 03:25 PM

February 07, 2018

OpenStack Superuser

Operator spotlight: Universitat Politècnica de Catalunya

We’re spotlighting users and operators who are on the front lines. These users are taking risks, contributing back to the community and working to secure the success of their organization in today’s software-defined economy. We want to hear from you, too: get in touch with editorATopenstack.org to share your story.

Here we catch up with Andrés Sánchez, master’s student and research support engineer at BAMPLA research group of the Universitat Politècnica de Catalunya (UPC), also known as BarcelonaTech.

Describe how are you using OpenStack. What kinds of applications or workloads are you currently running on OpenStack?

We have two OpenStack deployments:

1. Academic: we have an OpenStack deployment in one of our class laboratories where students learn the operating basics of OpenStack operation and management. In this laboratory, we also deploy a software-defined-network laboratory, where we use Openstack to deploy SDN controllers that students setup to control physical networking devices via OpenFlow.

2. Research: This is a “production” OpenStack deployment, it has two main functions: provide students doing their thesis with resources for conducting simulations or any kind of work load they require (the virtual machines are usually used for making simulations or deploying SDN scenarios like Mininet or SDN controllers). These research activities have the backing of the Spanish government under the Ministerio de Economía y Competitividad, specifically as TEC2016-76795-C6-1-R and AEI/FEDER, UE. The deployment is also currently supporting several Spanish and European-funded research projects such as 5GCity, 5G Barcelona.

The other function (which is the one I run) is to provide NFV/SDN research activities, I use this OpenStack as the VIM (referring to NFV-ETSI architecture) for deploying VNFs as part of proof-of-concepts or deploying any open-source NFV-related project (for example, OPNFV, Open Source Mano, CORD); I also use it to deploy SDN controllers for research.

What have been the biggest benefits to your organization as a result of using OpenStack? How are you measuring the impact?

The biggest benefits are:

– An open-source cloud deployment enables us to conduct research on trending SDN/NFV topics, a cloud-computing platform is the basic block for the NFV paradigm therefore it is paramount to have an operating deployment in order to be able to onboard the NFV subject. Openstack is seen as the standard platform for open-source NFV therefore in order to get involved in these projects you need to have an OpenStack deployment. We “measure” this by deploying or getting involved successfully any NFV open-source project or environment which we believe will produce academic publications.

– From an educational point of view, providing students with a private cloud for them to work on their projects instead on relying on their on equipment is a huge benefit since they don’t have to worry about resource constraints and their work becomes more portable and re-usable since VMs can be easily saved, imported and deployed on reproducible environments. I also think this will be “measured” with the results of academic publications.

What is a challenge that you’ve faced within your organization regarding OpenStack and how did you overcome it?

For me, the biggest challenge with OpenStack was the steep learning curve required to operate and manage the platform, as well as deploy it. It is a broad subject that is not easy to undertake. We overcame it by a lot of trial and error, I feel that more documentation containing real examples or use-case scenarios would be extremely beneficial for the community.

Superuser wants to hear more from operators like you, get in touch at editorATopenstack.org


Cover image courtesy UPC.

The post Operator spotlight: Universitat Politècnica de Catalunya appeared first on OpenStack Superuser.

by Nicole Martinelli at February 07, 2018 04:44 PM

Chris Dent

Placement Container Playground 2

This is a brief followup to placement container playground, wherein I've been using sticking placement into a container with the smallest footprint as a way of exploring the issues involved with extracting placement from nova.

I have two series that experiment with changes that could reduce the footprint:

But it turns out that neither of these are enough when integrating them into the container.

A newly revealed challenge is that when importing nova.conf (to access the global oslo_config) all the configuration is imported, leading to lots of things needing to import their own requirements. So, for example, conf/key_manager.py imports castellan.

Placement doesn't need that.

Then, importing the db_api imports the cells_rpcapi, which eventually leads to nova/objects/fields.py being imported, which wants cursive.

cells_rpcapi also imports the network_model which eventually leads to nova.utils, which wants os_service_types.

Placement doesn't need that either.

Placement only uses the db_api module to create a context manager that provides a database session. It's possible to isolate this code. A WIP now exists to Isolate the placement database config.

With that in place we hit the next roadblock: The database model/schema files are in the nova.db package and __init__.py in there does this: from nova.db.api import *. Which means the cells_rpcapi stuff mentioned still gets imported. Argh!

It might be an option to move the model files into a different place. Changing the nova/db/__init__.py would lead to changes in many other files, but might be worth it as importing any code in nova/db will trigger the code in __init__.py

There will probably be a -3 followup to this, with more steps forwards and more steps back.

by Chris Dent at February 07, 2018 10:15 AM

OpenStack Blog - Swapnil Kulkarni

My Interview at JetBrains for OpenStack Development with Pycharm

I was recently interviewed by Dmitry Filippov, Product Marketing Manager at JetBrains related to OpenStack Development with PyCharm. Here is the link for the interview.

by Swapnil Kulkarni at February 07, 2018 02:37 AM

February 06, 2018

OpenStack Superuser

How Network Functions Virtualization helps service providers fail fast

Network Functions Virtualization (NFV) has been a buzzword for about five years.

Heather Kirksey, director for Open Platform for NFV (OPNFV)  at the Linux Foundation gave a recent interview to the “Women in Tech Show” that offered a primer on it and the impact it’s having on telecoms.

First, she gave her a definition of network services. “(They) are communication-oriented services that typically have network-intensive requirements (video requires bandwidth, packets need to arrive in right order, with the right quality); they make a demand on the network and require collaboration,” she says.

NFV, then, is the product of two big trends in the last seven or eight years: the rise of cloud offers the ability to deploy workloads on fairly generic hardware and scale out, rather than up, to handle more services. The scaling  can be done elastically and dynamically, scaling up when you need more and pulling back when you don’t, she says.

The other trend is software-defined networking (SDN), which decouples control of the network and how the network is managed, how traffic is routed among endpoints from the data plane or actual user traffic.

“NFV is taking those two trends and using them to enable moving away from proprietary network elements to treating all the things in the network as general pools of compute in order to enable network services,” Kirksey says. “Instead of all these special, purpose-built custom pieces of hardware you can turn most of that intelligence that lived in hardware into software intelligence and start treating the network itself as well as the apps an services as cloud software applications.”

The impact of NFV? “Increased agility. Less risk,” Kirksey says.

“Agility because the services are being re-architected with more automation and more similarities, instead of having to go into the command-line interface of a hundred different things to enable a subscriber to or a service you can enable these things the way IT professionals are used to enabling new applications on the network. It makes them easier to roll out.”

It also means you don’t have to put special pieces of new hardware in the network for new services, making it easier to try new things. It allows service providers to fail fast, fail often and scale up when you find something successful….and take  “that Silicon Valley approach to trying out things.”

“It’s really hard for service providers to do that right now, but if you’re using just software applications, you can upgrade and deploy or get rid of if they don’t connect with your subscriber base. You haven’t gone and put things in people’s sidewalks, dug for new cable or put special hardware that you had custom-built for that purpose. It really enables a lot more freedom, agility and flexibility.”

Catch the entire 36-minute interview — it also touches on containers, Kubernetes and OpenStack — on the “Women in Tech Show” podcast.

Cover Photo // CC BY NC

The post How Network Functions Virtualization helps service providers fail fast appeared first on OpenStack Superuser.

by Superuser at February 06, 2018 05:02 PM

Chris Dent

TC Report 18-06

Nothing revolutionary in the past week of Technical Committee discussion. At least not that I witnessed. If there's a revolutionary cabal somewhere, pretty please I'd like to be a part of it.

Main activity is (again) related to openstack-wide goals and preparing for the PTG in Dublin.

PTG Planning

The schedule has mostly solidified. There are some etherpads related to post lunch discussions, including a specific one for Monday. See the irc logs for more context.

I've seen a fair number of people saying things like "since the PTG is coming up soon, let's talk about this there." Given how rare the face to face meetups are, I would hope that we could orient the time for talking about those things which are only (or best) talked about in person and keep the regular stuff in email. Long term planning, complex knowledge sharing, and conflict resolution are good candidates; choosing the color of the shed, not so much.

The PTG is expected to sell out; nine tickets left this morning.

Feedback Loops

Monday had a broad ranging discussion about gaps in the feedback loop, notably feedback from users who have as their primary point of contact their vendor. There was some sentiment of "we do a lot to try to make this happen, at a certain point we need to move forward with what we've got and trust ourselves".

As all conversations eventually do, this led to talk of LTS and whether renaming branches might be helpful. The eventually decision was more trouble than it was worth.

If you have some feedback you'd like to make, or something you think needs to be discussed at the PTG, please show up to office hours, send some email, or write something on one of the many PTG etherpads that are brewing. Thank you.

by Chris Dent at February 06, 2018 12:30 PM

February 05, 2018

OpenStack Superuser

Using JSON home on a Keystone server

Say you have an AUTH_URL like this:

$ echo $OS_AUTH_URL

And now you want to do something with it. You might think you can get the info you want from the /v3 url, but it does not tell you much:

$ curl $OS_AUTH_URL
 {"version": {"status": "stable", 
              "updated": "2016-10-06T00:00:00Z", 
              "media-types": [{"base": "application/json", 
                               "type": "application/vnd.openstack.identity-v3+json"}], 
              "id": "v3.7",
              "links": [{"href": "http://openstack.hostname.com:5000/v3/",
                         "rel": "self"}]}
[ayoung@ayoung541 salab]$

Not too helpful. Turns out, though, that there is data, it is just requires the json-home accepts header.

You access the document like this:

$ curl $OS_AUTH_URL -H "Accept: application/json-home"

I’m not going to past the output: it is huge.

Here is how I process it:

$ curl $OS_AUTH_URL -H "Accept: application/json-home" | jq '. | .resources '

Will format somewhat legibly. To get a specific section, say the endpoint list you can find it in the doc like this:

"http://docs.openstack.org/api/openstack-identity/3/rel/endpoints": {
"href": "/endpoints"

And to pull it out programatically:

$ curl -s $OS_AUTH_URL -H "Accept: application/json-home" | jq '. \
  | .resources |\
  | .href' "/endpoints"


This post first appeared on Adam Young’s blog.

For more on Keystone, an OpenStack service that provides API client authentication, service discovery and distributed multi-tenant authorization, check out the project Wiki.

Superuser is always interested in community content – get in touch: editorATopenstack.org

The post Using JSON home on a Keystone server appeared first on OpenStack Superuser.

by Adam Young at February 05, 2018 05:33 PM

Ed Leafe

A Guide to Alternate Hosts in Nova

One of the changes coming in the Queens release of OpenStack is the addition of alternate hosts to the response from the Scheduler’s select_destinations() method. If the previous sentence was gibberish to you, you can probably skip the rest of this post. In order to understand why this change was made, we need to understand the … Continue reading "A Guide to Alternate Hosts in Nova"

by ed at February 05, 2018 05:06 PM

Chris Dent

Placement Queens Summary

This is a summary of the main changes made to the Placement service in OpenStack, including only visible changes made to the placement service itself. This is cribbed from the commit history, but is only highlights, not a complete reckoning. There are plenty of other changes, both under the hood, and on the nova-side of the equation. Listing all that would make for a very long document. If a summary of the nova-side is desired, please ask.

Placement API Changes

A bug was fixed when updating allocations to prevent a 500 response if an empty set of resources was provided.

allocations was added to the list of links included in the resource providers representation. Initially identified by a bug.

The concept of a granular resource request syntax was partially implemented. This makes it possible to request resources in a more granular fashion as may be necessary in some situations with nested resource providers. The linked spec provides plenty of examples.

If a request does not provide an accept header, the service has been updated to default it to application/json. This ensures that all error responses have a standard (and structured) format. It was already the case that non-error responses would be JSON.

The format for GET and PUT of allocations was updated to be the same, and always be a dict-like structure. This allows easier and more consistent processing on the client side.

That new structure allowed for the creation of a POST to /allocations that allows an atomic action to manage allocations for multiple consumers in one request, supporting race-free migration allocations.

Nested resource provider information (root and parent) was added to the representation for /resource_providers and /resource_providers/{uuid} in a 1.14 microversion along with an in_tree query parameter to list the members of a tree of resource providers when provided with the UUID of one member of the tree.

The / URI in the Placement service no longer requires authentication. This helps support proper version discovery.

Entities in the Placement service now have minimal cache headers to prevent proxies and other HTTP clients doing inadvertent caching.

When requesting /allocation_candidates a limit parameter can be passed to constrain the number of allocation requests that are returned. This is designed to limit the amount of memory and bandwidth consumed by the scheduler. A config setting for the scheduler-side of the equation has been added.

A required parameter is added to GET /allocation_candidates enabling trait support. Resource providers will be filtered to only include those with the desired trait. Note: This functionality does not yet support reporting results in a form that is nested-resource-providers aware. That was something we hoped to get done in Queens but didn't quite make it.

by Chris Dent at February 05, 2018 01:45 PM

February 02, 2018

OpenStack Superuser

Outreachy program welcomes four new OpenStack interns

OpenStack provides open source software for building public and private clouds. The community is constantly moving and growing and very excited to invite newcomers; one of the ways we do that is through Outreachy internships.

A brief intro to the Outreachy program

Outreachy helps people from underrepresented groups get involved in free and open source software. This internships last three months and are organized twice a year. Several open source projects including GNOME, KDE, oVirt, Linux Kernel, Mozilla, Wikimedia, QEMU, Xen and, of course, OpenStack join and invite applicants to work on their projects.

Here’s what the current group will be working on this term with their mentors:

– Bugosi Phionah Nagaza, from Kampala, Uganda, working on “Go and Container related projects in OpenStack” with Davanum Srinivas.
– Kirsten Garrison, from Los Angeles, California working on “Container Monitoring” with Spyros Trigazis.
– Maysa de Macedo Souza, from Campina Grande, Brazil, working on “Add introspection HTTP REST points to the Kubernetes API watchers” with Antoni Segura Puimedon.
– Suramia Shah, from Almora, Uttrakhand, India working on “Consolidate Keystone docs” with Lance Brangstad.

Victoria Martínez de la Cruz, a former intern who now acts as volunteer coordinator, has this to say about the internship program.

“I strongly believe this program is making a huge change, both for the communities involved and the interns. During my time as an intern, I learned a lot about open source, not only the technical aspects but also the social ones,” she says. “Now, as a mentor, I expect to give something back to our community and share this knowledge with newcomers, giving them the opportunity to work on something that inspires them, encouraging them to reach their goals and helping to reduce the diversity gap we nowadays have in most open source communities. Our community needs motivated hands and a fresh look to keep making OpenStack a huge success.”

If you’re interested in applying for the next round or helping out as a mentor, applications open February 12. Check out the timelines and application tips on the Wiki page.

The post Outreachy program welcomes four new OpenStack interns appeared first on OpenStack Superuser.

by Superuser at February 02, 2018 03:34 PM

February 01, 2018

Chris Dent

Placement Extraction

This is a followup to Placement Container Playground

Since its inception there's been a plan that the OpenStack Placement service will be extracted from the nova code repository into a repository of its own. There are a variety of reasons for this, including:

  • Nova is already huge.

  • It could help to ensure that Placement evolves as a service that is useful to all OpenStack services, not just Nova.

  • A different set of core reviewers and leaders could build up around the system who don't need expertise in all of Nova to be effective caretakers of Placement.

  • Small things are easier to maintain and reason about.

  • An independent Placement can be architecturally different from Nova without causing complexity within a shared repo.

It was with these ideas in mind that Placement was born, but inside of Nova. The choice to start in Nova was one of pragramtism and expediency. Awareness of an eventual departure allowed some differences in how things are done:

  • Placement uses its own WSGI framework.

  • Placement does not use a paste.ini file.

  • Placement uses gabbi for functional API testing.

But some things were done the same:

  • Objects that represent the entities in the Placement system are based on and within the nova.objects versioned objects hierarchy. Initially the objects were versioned and made available for RPC, but eventually that was turned off as it wasn't required.

  • Placement data are persisted in the nova_api database.

  • A few pieces of WSGI middleware were borrowed from Nova.

  • Placement shares its configuration file with nova, using nova.conf.

Recently I've started exploring what it will take to change things so that Placement either can be extracted, or if not extracted, at least run with minimal nova code imported into the Placement service.

Placement is designed to be run in a lightweight and distributed fashion with many small web services running against the same data store. The earlier Placement Container Playground describes some of the work that started that.

Through that exploration several areas of future work were discovered:

  • Throughout Nova, the __init__.py file in some packages includes code that causes a cascade of imports. Any sub-package which is "below" one of these busy __init__.py is subject to those imports, even if it doesn't need them.

    One particular area of concern for this is near the top of the nova.api.openstack package. I've explored some options for resolving this with Refactor WSGI apps and utils to limit imports.

    Of course nova/__init__.py itself imports code, including eventlet, which is not used by Placement, so that remains to be resolved. Even without being concerned about Placement, this is something that should be fixed as Nova is made up of multiple processes: they don't all need all the code.

    Fixing these things "simply" requires attention to the problem and recognition that it needs to be fixed.

  • By existing within the nova.objects hierarchy, the Placement-related versioned objects must co-exist with all of the nova versioned objects, none of which are used by Placement, but all of which are automatically loaded by nova/objects/__init__.py. Move resource provider objects into placement hierarchy explores moving the objects within the Placement hierarchy.

    In the process the ResourceClass field has to be made independent as it is used by both the Nova and Placement sides of their interaction.

  • The Nova FaultWrapper was being used, but it turns out it is doing more work that it needs to and a simpler version can be used instead.

  • All of the interaction with the database that Placement does is done within the objects. The code that comes from elsewhere is the api_context_manager, resource_class_cache, the table models, and database migrations.

    Long time ago and far away the idea of an Optional separate database for placement API was declared worth exploring. I've been maintaining that branch ever since. It's lets people use a different configuration setting for the database URL for placement.

    In the container experiments that configuration setting is used, but it points to the nova_api database.

Migrations are considered a significant challenge for extracting Placement to an independent repository and service. I'm not certain this has to be the case, especially if we are willing to consider:

  • Letting Placement point to the nova_api database for people who prefer to do things that way.

  • Freezing the schema for Placement-related data entities for some amount of time.

  • Allowing, and in fact encouraging, the Placement service to be capable of reconstructing itself. This means:

    • Any resource provider regularly checks for its own existence and recreates itself as required (this is already true).
    • Any resource provider can be caused to correct its own allocations (this is not currently true).

    If we have that, then the content of a Placement database can be remade. This makes it capable of recovering from a catastrophe, or just making itself right when for some reason the datastore gets out of sync with reality.

The way to make progress on this is to experiment, iterate, and communicate.

by Chris Dent at February 01, 2018 06:01 PM

OpenStack Superuser

OpenStack User Committee elections: Time to nominate and vote!

OpenStack has been a vast success and continues to grow. Additional ecosystem partners are enhancing support for OpenStack and it has become more and more vital that the communities developing services around OpenStack lead and influence the products movement.

The OpenStack User Committee helps increase operator involvement, collects feedback from the community, works with user groups around the globe and parses through user survey data, to name a few key tasks. Users are critical and the User Committee aims to represent them.

We’re looking to elect three (3) User Committee members for this election. These User Committee seats will be valid for a one-year term. For this election, the Active User Contributor (AUC) community will review the candidates and vote.

So what makes an awesome candidate for the User Committee?

Well, to start, the nominee has to be an individual member of the OpenStack Foundation who is an Active User Contributor (AUC).  Additionally, below are a few things that will make you stand out:

  • If you are an OpenStack end-user and/or operator
  • An OpenStack contributor from the User Committee working groups
  • Actively engaged in the OpenStack community
  • Organizer of an OpenStack local User Group meetup

Beyond the kinds of community activities you’re already engaged in, the User Committee role adds some additional work. The User Committee usually interacts on email to discuss any pending topics. Prior to each Summit, we spend a few hours going through the User Survey results and analyzing the data.

You can nominate yourself or someone else by sending an email to the user-committee@lists.openstack.org mailing-list, with the subject: “UC Candidacy” by February 11th, 2018 05:59 UTC.

Voting for the User Committee (UC) members opens on February 12, 2018 and remains open until February 18, 2018 11:59 UTC.

The email should include a description of the candidate and what the candidate hopes to accomplish. More information can be found here.

We look forward to receiving your applications!

Cover Photo // CC BY NC

The post OpenStack User Committee elections: Time to nominate and vote! appeared first on OpenStack Superuser.

by Melvin Hillsman at February 01, 2018 05:00 PM

Craige McWhirter

Querying Installed Package Versions Across An Openstack Cloud

AKA: The Joy of juju run

Package upgrades across an OpenStack cloud do not always happen at the same time. In most cases they may happen within an hour or so across your cloud but for a variety reasons, some upgrades may be applied inconsistently, delayed or blocked on some servers.

As these packages may be rolling out a much needed patch or perhaps carrying a bug, you may wish to know which services are impacted in fairly short order.

If your OpenStack cloud is running Ubuntu and managed by Juju and MAAS, here's where juju run can come to the rescue.

For example, perhaps there's an update to the Corosync library libcpg4 and you wish to know which of your HA clusters have what version installed.

From your Juju controller, create a list of servers managed by Juju:

Juju 1.x:

$ juju stat --format tabular > jsft.out

Now you could fashion a query like this, utilising juju run:

$ for i in $(egrep -o '[a-z]+-hacluster/[0-9]+' jsft.out | cut -d/ -f1 | sort -u);
do juju run --timeout 30s --service $i "dpkg-query -W -f='\${Version}' libcpg4" | \
python -c 'import yaml,sys;print("\n".join(["{} == {}".format(y["Stdout"], y["UnitId"]) for y in yaml.safe_load(sys.stdin)]))';

The output returned will look something like this:

2.3.3-1ubuntu4 == ceilometer-hacluster/1
2.3.3-1ubuntu4 == ceilometer-hacluster/0
2.3.3-1ubuntu4 == ceilometer-hacluster/2
2.3.3-1ubuntu4 == cinder-hacluster/0
2.3.3-1ubuntu4 == cinder-hacluster/1
2.3.3-1ubuntu4 == cinder-hacluster/2
2.3.3-1ubuntu4 == glance-hacluster/3
2.3.3-1ubuntu4 == glance-hacluster/4
2.3.3-1ubuntu4 == glance-hacluster/5
2.3.3-1ubuntu4 == keystone-hacluster/1
2.3.3-1ubuntu4 == keystone-hacluster/0
2.3.3-1ubuntu4 == keystone-hacluster/2
2.3.3-1ubuntu4 == mysql-hacluster/1
2.3.3-1ubuntu4 == mysql-hacluster/2
2.3.3-1ubuntu4 == mysql-hacluster/0
2.3.3-1ubuntu4 == ncc-hacluster/1
2.3.3-1ubuntu4 == ncc-hacluster/0
2.3.3-1ubuntu4 == ncc-hacluster/2
2.3.3-1ubuntu4 == neutron-hacluster/2
2.3.3-1ubuntu4 == neutron-hacluster/1
2.3.3-1ubuntu4 == neutron-hacluster/0
2.3.3-1ubuntu4 == osd-hacluster/0
2.3.3-1ubuntu4 == osd-hacluster/1
2.3.3-1ubuntu4 == osd-hacluster/2
2.3.3-1ubuntu4 == swift-hacluster/1
2.3.3-1ubuntu4 == swift-hacluster/0
2.3.3-1ubuntu4 == swift-hacluster/2

Juju 2.x:

$ juju status > jsft.out

Now you could fashion a query like this:

$ for i in $(egrep -o 'hacluster-[a-z]+/[0-9]+' jsft.out | cut -d/ -f1 |sort -u);
do juju run --timeout 30s --application $i "dpkg-query -W -f='\${Version}' libcpg4" | \
python -c 'import yaml,sys;print("\n".join(["{} == {}".format(y["Stdout"], y["UnitId"]) for y in yaml.safe_load(sys.stdin)]))';

The output returned will look something like this:

2.3.5-3ubuntu2 == hacluster-ceilometer/1
2.3.5-3ubuntu2 == hacluster-ceilometer/0
2.3.5-3ubuntu2 == hacluster-ceilometer/2
2.3.5-3ubuntu2 == hacluster-cinder/1
2.3.5-3ubuntu2 == hacluster-cinder/0
2.3.5-3ubuntu2 == hacluster-cinder/2
2.3.5-3ubuntu2 == hacluster-glance/0
2.3.5-3ubuntu2 == hacluster-glance/1
2.3.5-3ubuntu2 == hacluster-glance/2
2.3.5-3ubuntu2 == hacluster-heat/0
2.3.5-3ubuntu2 == hacluster-heat/1
2.3.5-3ubuntu2 == hacluster-heat/2
2.3.5-3ubuntu2 == hacluster-horizon/0
2.3.5-3ubuntu2 == hacluster-horizon/1
2.3.5-3ubuntu2 == hacluster-horizon/2
2.3.5-3ubuntu2 == hacluster-keystone/0
2.3.5-3ubuntu2 == hacluster-keystone/1
2.3.5-3ubuntu2 == hacluster-keystone/2
2.3.5-3ubuntu2 == hacluster-mysql/0
2.3.5-3ubuntu2 == hacluster-mysql/1
2.3.5-3ubuntu2 == hacluster-mysql/2
2.3.5-3ubuntu2 == hacluster-neutron/0
2.3.5-3ubuntu2 == hacluster-neutron/2
2.3.5-3ubuntu2 == hacluster-neutron/1
2.3.5-3ubuntu2 == hacluster-nova/1
2.3.5-3ubuntu2 == hacluster-nova/2
2.3.5-3ubuntu2 == hacluster-nova/0

You can of course substitute libcpg4 in the above query for any package that you need to check.

By far and away my most favourite feature of Juju at present, juju run reminds me of knife ssh, which is unsurprisingly one of my favourite features of Chef.

by Craige McWhirter at February 01, 2018 10:19 AM

January 31, 2018

Adam Young

Matching Create and Teardown in an Ansible Role

Nothing lasts forever. Except some developer setups that no-one seems to know who owns, and no one is willing to tear down. I’ve tried to build the code to clean up after myself into my provisioning systems. One pattern I’ve noticed is that the same data is required for building and for cleaning up a cluster. When I built Ossipee, each task had both a create and a teardown stage. I want the same from Ansible. Here is how I’ve made it work thus far.

The main mechanism I use is a conditional include based on a variable set. Here is the task/main.yaml file for one of my modules:

- include_tasks: create.yml
  when: not teardown

- include_tasks: teardown.yml
  when: teardown

I have two playbooks which call the same role. The playbooks/create.yml file:

- hosts: localhost
    teardown: false
    -  provision

and the playbooks/teardown.yaml file:

- hosts: localhost
    teardown: true
    -  provision

All of the real work is done in the tasks/create.yml and tasks/teardown.yml files. For example, I need to create a bunch of Network options in Neutron in a particular (dependency driven) order. Teardown needs to be done in the reverse order. Here is the create fragment for the network pieces:

- name: int_network
    cloud: "{{ cloudname }}"
    state: present
    name: "{{ netname }}_network"
    external: false
  register: osnetwork

- os_subnet:
    cloud: "{{ cloudname }}"
    state: present
    network_name: "{{ netname }}_network"
    name: "{{ netname }}_subnet"

- os_router:
    cloud: "{{ cloudname }}"
    state: present
    name: "{{ netname }}_router"
    interfaces: "{{ netname }}_subnet"
    network: public

To tear this down, I can reverse the order:

- os_router:
    cloud: rdusalab
    state: absent
    name: "{{ netname }}_router"

- os_subnet:
    cloud: rdusalab
    state: absent
    network_name: "{{ netname }}_network"
    name: "{{ netname }}_subnet"

- os_network:
    cloud: rdusalab
    state: absent
    name: "{{ netname }}_network"
    external: false

As you can see, the two files share a naming convention: name: “{{ netname }}_network” should really be precalcualted in the vars file and then useed in both cases. That is a good future improvement.

You can see the real value when it comes to lists of objects. For example, to create a set of virtual machines:

- name: create CFME server
    cloud: "{{ cloudname }}"
    state: present
    name: "cfme.{{ clustername }}"
    key_name: ayoung-pubkey
    timeout: 200
    flavor: 2
    boot_volume: "{{ cfme_volume.volume.id }}"
      - "{{ securitygroupname }}"
      -  net-id:  "{{ osnetwork.network.id }}"
         net-name: "{{ netname }}_network"
      hostname: "{{ netname }}"
  register: cfme_server

It is easy to reverse this with the list of host names. In teardown.yml:

- os_server:
    cloud: "{{ cloudname }}"
    state: absent
    name: "cfme.{{ clustername }}"
  with_items: "{{ cluster_hosts  }}"

To create the set of resources I can run:

ansible-playbook   playbooks/create.yml 

and to clean up

ansible-playbook   playbooks/teardown.yml 

This is pattern scales. If you have three roles that all follow this pattern, they can be run in forward order to set up, and reverse order to teardown. However, it does tend to work at odds with Ansible’s Role dependency mechanism: Ansible does not allow you to only specify the dependent roles should be run in reverse in the teardown process.

by Adam Young at January 31, 2018 07:13 PM

OpenStack Superuser

Looking ahead to the next version of Firewall-as-a-Service in OpenStack


At the Liberty summit in Vancouver, the networking team talked about the future of security groups and decided that instead of redesigning  security groups to put that innovation into Firewall-as-a-Service (FWaaS) V2. The V2 API has been evolving over the subsequent releases.

The first step was to enable applying the Rules to L3 ports and also provide a means to specify the direction of traffic that needs firewalling. This puts the API more in line with traditional firewall models. Now with the Queens release, support for VM ports has been added. A notion of a Default FWG is supported, just as in security groups, to ensure security on bootup of a VM. You can find out more about the roadmap from the Summit session in Boston.

FWaaS V2 can run standalone or in combination with security groups. FWaaS V2 offers a rich API to describe rules for blocking, dropping, or accepting traffic as opposed to security groups who only have allow rules. FWaaS V2 also works on both router (L3) and VM ports (L2). It currently only works with OVS.

FWaaS V2 Building Blocks


FWaaS V2 distinguishes between egress and ingress policies which are assigned to a Firewall Group. Policies are made up of firewall rules. The Firewall Groups get assigned to router or vm ports and will handle traffic according to the rules in the policies.

For instance, a rule which allows ingress internet control message protocol (ICMP) traffic would look like:

Action Protocol ip-version src-ip/port dest-ip/port
allow icmp 4 None None

Similarly, if the action is “deny” or “reject” all ICMP won’t reach the port. Per default all traffic will be denied in both, ingress and egress, directions. A user has to explicitly allow traffic.

Let’s consider the following example:

Action Protocol ip-version src-ip/port dest-ip/port
allow icmp 4 None None
deny icmp 4 None None

In this example we have two rules: One allowing ICMP traffic and the other denying it. In this case, traffic will be allowed as the firewall rules are processed according to the order of rules.

Defense in depth

FWaaS V2 can run alongside security groups and an administrator could open all the IPs/ports they feel are safe in the network. This would allow users to focus on which ports/IPs are safe for their application. Vice versa, security groups could handle what’s safe for the app and FWaaS V2 with it’s deny rules could restrict that further.

Having two systems working with each other allows users to split concerns in a sane way and enables an in-depth defense. We’re hoping in a future version to have strata so users can’t change policies assigned by admins.

FWaaS V2 has some features like sharing of resources between tenants, an audit flag to quickly identify what needs to be reviewed and a default Firewall Group which works similar to the default security group.  You can read more about it here.

Watch FWaaS V2 in action

How can you get involved?

IRC: #openstack-fwaas

Check out the contributor guide: https://docs.openstack.org/neutron-fwaas/latest/contributor/contributing.html

Read more about the specs: http://specs.openstack.org/openstack/neutron-specs/specs/newton/fwaas-api-2.0.html


Superuser is always interested in community content. Get in touch at editorATopenstack.org


Cover Photo // CC BY NC

The post Looking ahead to the next version of Firewall-as-a-Service in OpenStack appeared first on OpenStack Superuser.

by German Eichberger and Chandan Dutta Chowdhury at January 31, 2018 04:32 PM


4 new OpenStack tips and guides

Want to keep up with the open source cloud? Check out these fantastic new resources.

by Jason Baker at January 31, 2018 08:00 AM

OpenStack Blog - Swapnil Kulkarni

Kata Containers Dev environment setup with Vagrant

With reference to Kata Containers Developers Guide steps, I setted up the  development environment. At the same time, I went ahead and created a little automation to recreate the environment with Vagrant.

The primary code to create the environment is pushed at vagrant-kata-dev.

For setting it up, you will need,

  • VirtualBox (Currently only tested with virtualbox)
  • Vagrant with following plugins
    • vagrant-vbguest
    • vagrant-hostmanager
    • vagrant-share

To Install the plugins, use following command,

$ vagrant plugin install <plugin-name>

The setup instructions are simple, once you have installed the prereqs, clone the repo

$ git clone https://github.com/coolsvap/vagrant-kata-dev

Edit the Vagrantfile to update details

  1. Update the bridge interface so the box will have IP address from your local network using DHCP. If you do not update, it will ask for the interface name you start machine.
  2. Update the golang version, currently its at 1.9.3

Create the vagrant box with following command

$ vagrant up

Once the box is started, login to the box using following command

$ vagrant ssh

Switch to root user and move to vagrant shared directory and install the setup script

$ sudo su

# cd /vagrant

# ./setup-kata-dev.sh

It will perform the steps required to setup the dev environment. Verify the setup done correctly with following steps

# docker info | grep Runtime
WARNING: No swap limit support
Runtimes: kata-runtime runc
Default Runtime: runc

Hope this helps new developers get started with Kata Development. This is just first version of the automation and please help me better with your inputs.


by Swapnil Kulkarni at January 31, 2018 05:30 AM

Adam Young

Deploying an image on OpenStack that is bigger than the available flavors.

Today I tried to use our local OpenStack instance to deploy CloudForms Management Engine (CFME). Our OpenStack deployment has a set of flavors that all are defined with 20 GB Disks. The CFME image is larger than this, and will not deploy on the set of flavors. Here is how I worked around it.

The idea is that, instead of booting a server on Nova using an image and a flavor, first create a bootable volume, and use that to launch the virtual machine.

The command line way to create an 80 GB volume would be:

openstack volume create --image cfme-rhevm- --size 80 bootable_volume

But as you will see later, I used ansible to create it instead.

Uploading the image (downloaded from the redhat.com portal)

openstack image create --file ~/Downloads/cfme-rhevm- cfme-rhevm-

Which takes a little while. Once it is done:

$ openstack image show cfme-rhevm-
| Field            | Value                                                                           |
| checksum         | 52c57210cb8dd2df26ff5279a5b0be06                                                |
| container_format | bare                                                                            |
| created_at       | 2018-01-30T21:09:20Z                                                            |
| disk_format      | raw                                                                             |
| file             | /v2/images/cfcca613-40d9-44c8-b12f-e0ddc93ab914/file                            |
| id               | cfcca613-40d9-44c8-b12f-e0ddc93ab914                                            |
| min_disk         | 0                                                                               |
| min_ram          | 0                                                                               |
| name             | cfme-rhevm-                                                           |
| owner            | fc56aad6163c44dc8beb0c287a975ca3                                                |
| properties       | direct_url='file:///var/lib/glance/images/cfcca613-40d9-44c8-b12f-e0ddc93ab914' |
| protected        | False                                                                           |
| schema           | /v2/schemas/image                                                               |
| size             | 1072365568                                                                      |
| status           | active                                                                          |
| tags             |                                                                                 |
| updated_at       | 2018-01-30T21:35:30Z                                                            |
| virtual_size     | None                                                                            |
| visibility       | private                                                                         |

I used Ansible to create the volume and the server. This is the fragment from my task.yaml file.

- name: create CFME volume
    cloud: "{{ cloudname }}"
    image: cfme-rhevm-
    size: 80
    display_name: cfme_volume
  register: cfme_volume

- name: create CFME server
    cloud: "{{ cloudname }}"
    state: present
    name: "cfme.{{ clustername }}"
    key_name: ayoung-pubkey
    timeout: 200
    flavor: 2
    boot_volume: "{{ cfme_volume.volume.id }}"
      - "{{ securitygroupname }}"
      -  net-id:  "{{ osnetwork.network.id }}"
         net-name: "{{ netname }}_network"
      hostname: "{{ netname }}"
  register: cfme_server

The interesting part is the boot_volume: “{{ cfme_volume.volume.id }} line, which uses the value registered in the volume create step to get the id of the new volume.

by Adam Young at January 31, 2018 03:44 AM

Freeing up a Volume from a Nova server that errored

Trial and error. Its a key part of getting work done in my field, and I make my share of errors. Today, I tried to create a virtual machine in Nova using a bad glance image that I had converted to a bootable volume:

The error message was:

 {u'message': u'Build of instance d64fdd07-748c-4e27-b212-59e8cef9d6bf aborted: Block Device Mapping is Invalid.', u'code': 500, u'created': u'2018-01-31T03:10:56Z'}

The VM could not release the volume.

$  openstack server remove volume d64fdd07-748c-4e27-b212-59e8cef9d6bf de4909df-e95c-4a54-af5c-c24a26146a89
Can't detach root device volume (HTTP 403) (Request-ID: req-725ce3fa-36e5-4dd8-b10f-7521c91a5c32)

So I deleted the instance:

  openstack server delete d64fdd07-748c-4e27-b212-59e8cef9d6bf

But when I went to list the volumes:

| ID                                   | Name        | Status | Size | Attached to                                                   |
| de4909df-e95c-4a54-af5c-c24a26146a89 | xxxx        | in-use |   80 | Attached to d64fdd07-748c-4e27-b212-59e8cef9d6bf on /dev/vda  |
$ openstack volume delete de4909df-e95c-4a54-af5c-c24a26146a89
Failed to delete volume with name or ID 'de4909df-e95c-4a54-af5c-c24a26146a89': Invalid volume: Volume status must be available or error or error_restoring or error_extending and must not be migrating, attached, belong to a group or have snapshots. (HTTP 400) (Request-ID: req-f651299d-740c-4ac9-9f52-8a603eace8f6)
1 of 1 volumes failed to delete.

To unwedge it I need to run:

$ cinder reset-state --attach-status detached de4909df-e95c-4a54-af5c-c24a26146a89
Policy doesn't allow volume_extension:volume_admin_actions:reset_status to be performed. (HTTP 403) (Request-ID: req-8bdff31a-7745-4e5e-a449-a5dac5d87f70)
ERROR: Unable to reset the state for the specified entity(s).

SO, finally, I had to get an admin account (role admin on any project will work, still…)

. ~/devel/openstack/salab/rduv3-admin.rc
cinder  reset-state --attach-status detached de4909df-e95c-4a54-af5c-c24a26146a89

And now (as my non admin user)

$ openstack volume list  
| ID                                   | Name        | Status    | Size | Attached to                                   |
| de4909df-e95c-4a54-af5c-c24a26146a89 | xxxx        | available |   80 |                                               |
$ openstack volume delete xxxx
$ openstack volume list  
| ID                                   | Name        | Status | Size | Attached to                                   |

I talked with the Cinder team about the policy for volume_extension:volume_admin_actions:reset_status and they seem to think that it is too unsafe for an average user to be able to perform. Thus, a “force delete” like this would need to be a new operation, or a different flag on an existing operation.

We’ll work on it.

by Adam Young at January 31, 2018 03:44 AM

January 30, 2018

Chris Dent

TC Report 18-05

Your author has been rather ill, so this week's TC Report will be a bit abridged and mostly links. I'll try to return with more robust commentary next week.

RDO Test Days

dmsimard showed up in #openstack-tc channel to provide some details on forthcoming RDO Test Days.

More on the Goals

Conversations continue about choosing OpenStack goals. One issue raised is whether we have sufficient insight into how people are really using and deploying OpenStack to be able to prioritize goals.

Project Inception

smcginnis pointed out a governance discussion in the CNCF about project inception that he thought might be of interest to the TC. Discussion ensued around how the OpenStack ecosystem differs from the CNCF and as a result the need, or lack thereof, for help to bootstrap projects is different.

PTG Scheduling

The PTG is coming and with it discussion of how best to make room for all the conversations that need to happen, including things that come up at the last minute. New this time will be a more formal structuring of post-lunch presentations to do theme-setting and information sharing.

Board Meeting at the PTG

Monday there was discussion about the OpenStack Foundation Board Meeting overlapping with the first day of the PTG. This follows discussion in email.

Today's Board Meeting

I attended today's Board Meeting but it seems that according to the transparency policy I can't comment:

No commenting on Board meeting contents and decisions until Executive Director publishes a meeting summary

That doesn't sound like transparency to me, but I assume there must be reasons.

Update: The reasons are that it doesn't apply to non board members. The restriction is so that any commentary by board members that might be incorrect is not misconstrued as official utterances. So next time I'm able to attend a board meeting (I likely won't be at the next one due to the unfortunate overlap with the PTG activity) I ought to be able to summarize it here.

by Chris Dent at January 30, 2018 06:45 PM

OpenStack Superuser

Get started with “OpenStack Bootcamp”

There are a lot of ways to jump start your OpenStack knowledge. A recent book called “OpenStack Bootcamp” bills itself as a focused and systematic introduction, using practical examples and hands-on problems.  And despite the name, the author says it’s a “gentle introduction” to OpenStack.

Superuser talked to author Vinoth Kumar Selvaraj, a cloud engineer at CloudEnablers, about “basic training” metaphor and what’s on his OpenStack bookshelf at the moment. (Check out his Superuser tutorials, too!)The 302-page book is available from Packt Publishing in eBook as well as hardcover formats.

Who will this book help most?


Yes! I assume the readers are completely new to the cloud world who is looking for intensive knowledge of OpenStack without wasting time to learn ABC on day 1 and XYZ on day 2. This book is designed for the beginners who want to jump right into the practical knowledge, exercises, and solving the basic problems encountered during the deployment, in order to get up to speed with the latest release of OpenStack.

How did you get the idea for this book? (and why the “boot camp” analogy?)

I firmly believe that a hands-on experience with OpenStack will help the beginners to understand OpenStack design a lot better than just reading through the details.

This book will be more on filling in the practical learning gaps and this learn-by-doing approach gave it the title. I focused more on hands-on exercises for readers instead of starting with the history and evolution of OpenStack.

What are some of the most common problems new people to OpenStack have?

I think that the major problem for beginners is being overwhelmed by all the available functionality in OpenStack and not starting with something simple by focusing on core projects of OpenStack first. Users should be confident enough in operating the base components efficiently before they start picking up the bells and whistles of OpenStack.

People should also see the OpenStack from the architectural perspective and understand how the basic components interact.

Knowledge of the underlying architectural design is extremely useful in keeping your OpenStack infrastructure running. I could see that learning how all of the interoperability works is still a gap for beginners. I hope to bridge that gap with my book.

Why is a book helpful now — in addition to IRC, mailing lists, documentation, video tutorials etc.?

Yes! there are a lot of ways people can get to know OpenStack today. I think the volume of information out there on OpenStack, while certainly comprehensive, can be a little difficult for the newbie to start with.

As I said earlier, the book “OpenStack Bootcamp” is for beginners. I firmly believe my book would be a gentle starting point which would give them a good start to go and get further information from the resources you have mentioned.

What are some of the most exciting things you’ve seen recently in terms of OpenStack developments?

I observed that the uptake of OpenStack in containerization, NFV and the private cloud use cases are interesting at this time.

What’s on your OpenStack bookshelf?

Looking at my bookshelf now I could see,

1) “OpenStack Cloud Security”​

2) “Learning OpenStack High Availability

3) “OpenStack Essentials – Second Edition”

4) “Containers in OpenStack”

(Full disclosure, I was the technical reviewer for these books) 🙂

If you want to fill out your shelf more, remember that the OpenStack Marketplace — your one-stop shop for training, distros, private-cloud-as-a-service and more — offers a selection of technical publications, too. The listings are not affiliate links, but offered as a way to highlight the efforts of community members.

Under the “books” heading, you’ll find titles by Stackers including “Mastering OpenStack,” “OpenStack Networking Essentials,” and “OpenStack: Building a Cloud Environment.”


The post Get started with “OpenStack Bootcamp” appeared first on OpenStack Superuser.

by Nicole Martinelli at January 30, 2018 03:32 PM

OpenStack in Production

Keep calm and reboot: Patching recent exploits in a production cloud

At CERN, we have around 8,500 hypervisors running 36,000 guest virtual machines. These provide the compute resources for both the laboratory's physics program but also for the organisation's administrative operations such as paying bills and reserving rooms at the hostel. These resources are spread over many different server configurations, some of them over 5 years old.

With the accelerator stopping over the CERN annual closure until mid March, this is a good period to be planning reconfiguration of compute resources such as the migration of our central batch system which schedules the jobs across the central compute resources to a new system based on HTCondor. The compute resources are heavily used but there is more flexibility to drain some parts in the quieter periods of the year when there is not 10PB/month coming from the detectors. However, this year we have had an unexpected additional task to deploy the fixes for the Meltdown and Spectre exploits across the centre.

The CERN environment is based on Scientific Linux CERN 6 and CentOS 7. The hypervisors are now entirely CentOS 7 based with guests of a variety of operating systems including Windows flavors and CERNVM. The campaign to upgrade involved a number of steps
  • Assess the security risk
  • Evaluate the performance impact
  • Test the upgrade procedure and stability
  • Plan the upgrade campaign
  • Communicate with the users
  • Execute the campaign

Security Risk

The CERN environment consists of a mixture of different services, with thousands of projects on the cloud, distributed across two data centres in Geneva and Budapest. 

Two major risks were identified
  • Services which provided the ability for end users to run their own programs along with others sharing the same kernel. Examples of this are the public login services and batch farms. Public login services provide an interactive Linux environment for physicists to log into from around the world, prepare papers, develop and debug applications and submit jobs to the central batch farms. The batch farms themselves provide 1000s of worker nodes processing the data from CERN experiments by farming event after event to free compute resources. Both of these environments are multi-user and allow end users to compile their own programs and thus were rated as high risk for the Meltdown exploit.
  • The hypervisors provide support for a variety of different types of virtual machines. Different areas of the cloud provide access to different network domains or to compute optimised configurations. Many of these hypervisors will have VMs owned by different end users and therefore can be exposed to the Spectre exploits, even if the performance is such that exploiting the problem would take significant computing time.
The remaining VMs are for dedicated services without access for end user applications or dedicated bare metal servers for I/O intensive applications such as databases and disk or tape servers.

There are a variety of different hypervisor configurations which we split down by processor type (in view of the Spectre microcode patches). Each of these needs independent performance and stability checks.

Processor name(s)
E5-2630 v3 @ 2.40GHz,E5-2640 v3 @ 2.60GHz
E5-2630 v4 @ 2.20GHz, E5-2650 v4 @ 2.20GHz
E5-2650 v2 @ 2.60GHz
CPU family: 21 Model: 1 Model name: AMD Opteron(TM) Processor 6276 Stepping: 2
E5-2630L 0 @ 2.00GHz, E5-2650 0 @ 2.00GHz
E5645 @ 2.40GHz, L5640 @ 2.27GHz, X5660 @ 2.80GHz

These risks were explained by the CERN security team to the end users in their regular blogs.

Evaluating the performance impact

The High Energy Physics community uses a suite called HEPSPEC06 to benchmark compute resources. These are synthetic programs based on the C++ components of SPEC CPU2006 which match the instruction mix of the typical physics programs. With this benchmark, we have started to re-benchmark (the majority of) the CPU models we have in the data centres, both on the physical hosts and on the guests. The measured performance loss across all architectures tested so far is about 2.5% in HEPSPEC06 (a number also confirmed by by one of the LHC experiments using their real workloads) with a few cases approaching 7%. So for our physics codes, the effect of patching seems measurable, but much smaller than many expected. 

Test the upgrade procedure and stability

With our environment based on CentOS and Scientific Linux, the deployment of the updates for Meltdown and Spectre were dependent on the upstream availability of the patches. These could be broken down into several parts
  • Firmware for the processors - the microcode_ctl packages provide additional patches to protect against some parts of Spectre. This package proved very dynamic as new processor firmware was being added on a regular basis and it was not always clear when this needed to be applied, the package version would increase but it was not always that this included an update for the particular hardware type. Following through the Intel release notes,  there were combinations such as "HSX C0(06-3f-02:6f) 3a->3b" which explains that the processor description 06-3f-02:6f is upgraded from release 0x3a to 0x3b. The fields are the CPU family, model and stepping from /proc/cpuinfo and the firmware level can be found at /sys/devices/system/cpu/cpu0/microcode/version. A simple script (spectre-cpu-microcode-checker.sh) was made available to the end users so they could check their systems and this was also used by the administrators to validate the central IT services.
  • For the operating system, we used a second script (spectre-meltdown-checker.sh) which was derived from the upstream github code at https://github.com/speed47/spectre-meltdown-checker.  The team maintaining this package were very responsive incorporating our patches so that other sites could benefit from the combined analysis.

Communication with the users

For the cloud, there are several resource consumers.
  • IT service administrators who provide higher level functions on top of the CERN cloud. Examples include file transfer services, information systems, web frameworks and experiment workload management systems. While some are in the IT department, others are representatives of their experiments or supporters for online control systems such as those used to manage the accelerator infrastructure.
  • End users consume cloud resources by asking for virtual machines and using them as personal working environments. Typical cases would be a MacOS user who needs a Windows desktop where they would create a Windows VM and use protocols such as RDP to access it when required.
The communication approach was as follows:
  • A meeting was held to discuss the risks of exploits, the status of the operating systems and the plan for deployment across the production facilities. With a Q&A session, the major concerns raised were around potential impact on performance and tuning options. 
  • An e-mail was sent to all owners of virtual machine resources informing them of the upcoming interventions.
  • CERN management was informed of the risks and the plan for deployment.
CERN uses ServiceNow to provide a service desk for tickets and a status board of interventions and incidents. A single entry was used to communicate the current plans and status so that all cloud consumers could go to a single place for the latest information.

Execute the campaign

With the accelerator starting up again in March and the risk of the exploits, the approach taken was to complete the upgrades to the infrastructure in January, leaving February to find any residual problems and resolve them. As the handling of the compute/batch part of the infrastructure was relatively straight forward (with only one service on top), we will focus in the following on the more delicate part of hypervisors running services supporting several thousand users in their daily work.

The layout of our infrastructure with its availability zones (AVZs) determined the overall structure and timeline of the upgrade. With effectively four AVZs in our data centre in Geneva and two AVZs for our remote resources in Budapest, we scheduled the upgrade for the services part of the resources over four days.

The main zones in Geneva were done one per day, with a break after the first one (GVA-A) in case there were unexpected difficulties to handle on the infrastructure or on the application side. The remaining zones were scheduled on consecutive days (GVA-B and GVA-C), the smaller ones (critical, WIG-A, WIG-B) in sequential order on the last day. This way we upgraded around 400 hosts with 4,000 guests per day.

Within each zone, hypervisors were divided into 'reboot groups' which were restarted and checked before the next group was handled. These groups were determined by the OpenStack cells underlying the corresponding AVZs. Since some services required to limit the window of service downtime, their hosting servers were moved to the special Group 1, the only one for which we could give a precise start time.

For each group several steps were performed:
  • install all relevant packages
  • check the next kernel is the desired one
  • reset the BMC (needed for some specific hardware to prevent boot problems)
  • log the nova and ping state of all guests
  • stop all alarming 
  • stop nova
  • shut down all instances via virsh
  • reboot the hosts
  • ... wait ... then fix hosts which did not come back
  • check running kernel and vulnerability status on the rebooted hosts
  • check and fix potential issues with the guests
Shutting down virtual machines via 'virsh', rather than the OpenStack APIs, was chosen to speed up the overall process -- even if this required to switch off nova-compute on the hosts as well (to keep nova in a consistent state). An alternative to issuing 'virsh' commands directly would be to configure 'libvirt-guests', especially in the context of the question whether guests should be shut down and rebooted (which we did during this campaign) or paused/resumed. This is an option we'll have a look at to prepare for similar campaigns in the future.

As some of the hypervisors in the cloud had very long uptimes and this was the first time we systematically rebooted the whole infrastructure since the service went to full production about five years ago, we were not quite sure what kind issues to expect -- and in particular at which scale. To our relief, the problems encountered on the hosts hit less than 1% of the servers and included (in descending order of appearance)
  • hosts stuck in shutdown (solved by IPMI reset)
  • libvirtd stuck after reboot (solved by another reboot)
  • hosts without network connectivity (solved by another reboot)
  • hosts stuck in grub during boot (solved by reinstalling grub) 
On the guest side, virtual machines were mostly ok when the underlying hypervisor was ok as well.
A few additional cases included
  • incomplete kernel upgrades, so the root partition could not be found (solved by booting back into an older kernel and reinstall the desired kernel)
  • file system issues (solved by running file system repairs)
So, despite initial worries, we hit no major issues when rebooting the whole CERN cloud infrastructure!


While these kind of security issues do not arrive very often, the key parts of the campaign follow standard steps, namely assessing the risk, planning the update, communicating with the user community, execution and handling incomplete updates.

Using cloud availability zones to schedule the deployment allowed users to easily understand when there would be an impact on their virtual machines and encourages good practise to load balance resources.



  • Arne Wiebalck
  • Jan Van Eldik
  • Tim Bell

by Tim Bell (noreply@blogger.com) at January 30, 2018 01:31 PM

Andrea Frittoli

Cross-service testing – The Tempest plugin

This post is the second part of a series about writing a Tempest plugin for cross-service integration testing in OpenStack.
If you missed the first post you might want to go back and read the introduction at least.

Integration tests for Nova and Neutron are included in Tempest, but those for Designate and Heat are not, so, after preparing the test environment, the second thing I needed was Tempest plugins for both projects. The Tempest plugin interface allows plugins to expose new service clients so that other plugins can easily configure and use them to write integration tests. Unfortunately, that interface was implemented by neither Designate nor Heat.

The Designate team maintains a Tempest plugin. I forked the plugin to add the service client plugin interface. The changes have since been merged back into the official plugin. The designate plugin included, at the time of writing, three different service clients, so I implemented the plugin interface to expose them to other plugins as follows:

def get_service_clients(self):
        dns_config = config.service_client_config('dns')
        admin_params = {
            'name': 'dns_admin',
            'service_version': 'dns.admin',
            'module_path': 'designate_tempest_plugin.services.dns.admin',
            'client_names': ['QuotasClient']
        v1_params = {
            'name': 'dns_v1',
            'service_version': 'dns.v1',
            'module_path': 'designate_tempest_plugin.services.dns.v1',
            'client_names': ['DomainsClient', 'RecordsClient', 'ServersClient']
        v2_params = {
            'name': 'dns_v2',
            'service_version': 'dns.v2',
            'module_path': 'designate_tempest_plugin.services.dns.v2',
            'client_names': ['BlacklistsClient', 'PoolClient', 'QuotasClient',
                             'RecordsetClient', 'TldClient',
                             'TransferRequestClient', 'TsigkeyClient',
                             'ZoneExportsClient', 'ZoneImportsClient',
        return [admin_params, v1_params, v2_params]

For this to work, all the clients must be available in the same module. However as a common practice both Tempest and plugins separate the service client for each API in a dedicated module. This problem can be solved by overriding __all__ in the service client __init__.py module:

from designate_tempest_plugin.services.dns.v1.json.domains_client import DomainsClient
from designate_tempest_plugin.services.dns.v1.json.records_client import RecordsClient
from designate_tempest_plugin.services.dns.v1.json.servers_client import ServersClient

__all__ = ['DomainsClient', 'RecordsClient', 'ServersClient']

The Heat team maintains a Tempest plugin as well, but they don’t maintain a service client in Tempest format at all. I forked the existing code, added a service client and put the code on GitHub.

Preparing the new Tempest plugin

With all preconditions in place, I was ready to start working on the cross-service Tempest plugin.
The first step is to create the plugin skeleton using cookiecutter:

pip install cookiecutter
cookiecutter https://git.openstack.org/openstack/tempest-plugin-cookiecutter.git

I set project and repo_name to cross_service_tempest_plugin and the test class name to CrossServiceTempestPlugin.
The plugin must be an installable python package. Similar to what most OpenStack projects do, I used pbr to simplify the setup.py.
I added three files for this, which are available on the workshop GitHub repo:

  • setup.cfg
  • setup.py
  • requirements.txt

Files and entry points in setup.cfg must match the project and test class values passed to cookiecutter:

packages =

tempest.test_plugins =
    cross_service = cross_service_tempest_plugin.plugin:CrossServiceTempestPlugin

The requirements file includes only Tempest and the two Tempest plugins. Since the plugins are not on PyPi, I specified them using their full git url:

tempest>=17.1.0 # Apache-2.0
-e git+https://github.com/afrittoli/designate-tempest-plugin#egg=designate_tempest_plugin
-e git+https://github.com/afrittoli/heat-tempest-plugin#egg=heat_tempest_plugin

Implement plugin.py

The next step was to implement in the plugin.py module all the methods exposed by Tempest in its Plugin interface plugins.TempestPlugin.

The first step was to create a class that inherits from Tempest’s one and implement the load_tests method, to make tests form the plugin discoverable by Tempest.

import os

from tempest.test_discover import plugins

class CrossServiceTempestPlugin(plugins.TempestPlugin):
    def load_tests(self):
        base_path = os.path.split(os.path.dirname(
        test_dir = "cross_service_tempest_plugin/tests"
        full_test_dir = os.path.join(base_path, test_dir)
        return full_test_dir, base_path

Then I wanted to make the plugin configurable. Since I wanted to be able to customise the DNS domain name, I needed a way to tell the test what domain name to use and expect. Tempest plugins allow extending the standard Tempest configuration file with plugin custom configuration groups and values.

The new configuration option was defined in a new module cross_service_tempest_plugin/config.py

                 help="The DNS domain used for testing.")

I extended the CrossServiceTempestPlugin class two includes the implementation of two more methods from the Tempest plugin interface:

  • register_opts is used by Tempest to register the extra options
  • get_opt_lists is used for config option discovery, used for instance to generate a sample config file

The implementation is straight-forward:

def register_opts(self, conf):

    def register_opts(self):
        return [

The plugin interface allows plugins to extend existing configuration groups with new configuration items. Plugins associated with a specific service usually extend tempest ServiceAvailableGroup with their own service. This is how it’s done in the plugin for designate: 

from oslo_config import cfg

service_available_group = cfg.OptGroup(name="service_available",
                                       title="Available OpenStack Services")

ServiceAvailableGroup = [
                help="Whether or not designate is expected to be available."),

It’s important to note that configuration groups extended this way should not be registered in register_opts nor returned by register_opts since they are already known to Tempest.

Further examples are available in the documentation.

The fourth and last method of the plugin interface is used to expose service clients. I already had all the required service clients from either Tempest or the existing plugins, so I didn’t need to define any new one.
I extended the CrossServiceTempestPlugin class with the following implementation of the service clients interface:

def get_service_clients(self):
        # No extra service client defined by this plugin
        return []

At this point, I was ready to add a test class an start writing the test code. I created a new module test_cross_service.py under cross_service_tempest_plugin/tests/scenario. The basic class definition is:

from tempest import test

class HeatDriverNeutronDNSIntegration(test.BaseTestCase):

	def test_floating_ip_with_name_from_port_to_dns(self):

With this code added to a git repo, I had an installable Tempest plugin, with configuration options and one discoverable test.
Since Tempest is installed in a python virtual environment by devstack, it was possible to test everything done to this point:

source ~/tempest/.tox/tempest/bin/activate
pip install cross_service_tempest_plugin
cd ~/tempest
tempest run --regex test_cross_service --list
tempest run --regex test_cross_service

In the next blog post, I will explain how to write the scenario test in Tempest.

by Andrea at January 30, 2018 10:46 AM

January 29, 2018

OpenStack Superuser

Tips for contributing to OpenStack

If you’re new to OpenStack and want to get started contributing, OpenStack Foundation Community managers are here to help.  Ildiko Vancsa, ecosystem technical lead, Kendall Nelson, upstream developer advocate and Mike Perez, development coordinator have a ton of tips for moving forward.

I’m new to OpenStack – how can I get started contributing?

If you are already planning on attending one of the upcoming summits, come a few days early so that you can come to the OpenStack Upstream Institute. It’s a day-and-a-half long training that teaches you the basics of the community, what tools you will be using on a regular basis and walks you through the contribution process.

Start engaging with the community. I can be cautious when approaching unfamiliar communities. As being part of OpenStack for quite some time, I have always wished I would’ve engaged sooner so I could’ve learned things faster.

Engagements and contributions come in different forms. Contributing to OpenStack is far more than code, after all, we’re more than just code, we’re a community. Having a healthy community requires expertise of all different kinds, therefore you should feel like you have something to offer.

An example of a contribution other than code is the UX work that was led to by Pieter Kruithof in doing user studies to improve the OpenStack Client. That work was tremendous in informing the OpenStack Client how people were intending on using the client.

After attending upstream training, how can I find tools and resources to stay connected to the community and continue contributing?

The OpenStack Foundation and Documentation team have been working together in providing great resources for new and current OpenStack Contributors. The Contributor Portal provides a list of different ways our community contributes, and ways you can get involved in those areas.

For the veteran contributors, a link of some common resources is provided, which can be helpful for reference for yourself, or pointing a fellow contributors to.

The Developer Digest provides some quick points of conversations that can sometimes be over 50+ message threads. If you don’t have time to scan through hundreds of emails weekly, definitely give the digest a read. It’s also community contributed, so another way you can get involved in the community!

Where is my help needed in the OpenStack community?

The OpenStack Technical Committee approves the Top 5 Most Wanted Help that are proposed by the community. These are recognizing essential parts in our community that need help. For example documentation is so essential for our users to use OpenStack. Due to some events, we’ve lost a great deal of our documentation team. We now have specifications set that are being led by some people in our community who don’t normally focus on documentation core tasks, which is a great help to the Documentation team. You can also help!

What does it take to keep being a successful contributor?

Now that you’re engaged with the community you must stay engaged. It only takes a few days to get behind on the various mailing lists. It’s going to seem extremely overwhelming and easy to just subscribe yourself to all the initiatives in OpenStack. You’ll eventually figure out what various channels you need to keep a close eye on to stay up-to-date:

  1. Figure out what you’re going to contribute to.
  2. Subscribe to the various things that your project group is involved with.
    1. Join IRC channels
    2. Subscribe to some mailing lists. You’ll get lots of email from some so make sure to setup folder filters!
    3. Filter for messages in the various mailing lists for the subject containing tags of the project you’re part of. (e.g. for the Cloud Kitty Project the subject would contain [cloudkitty]).

Keeping up with the various OpenStack mailing lists can be tough, but luckily we have great summary resources in the OpenStack Blog like the OpenStack Dev Digest for keeping up with the Dev list, or the User Group Newsletter.

Events are great to learn and hear from fellow members of the community of their experiences with OpenStack. You might discover something you can help contribute to the community!

Here’s a quick cheat sheet of events:

  • Project Team Gathering (PTG) – Every six months (Q1 and Q3), at the beginning of the development phase of a release cycle, project teams will meet in person to discuss priorities for the upcoming cycle, iterate quickly on solutions for complex issues and make fast progress on critical items. It’s part of a reorganization of the event formerly known as the Design Summit, which is now split between our biannual Summit (where feedback listening, requirements gathering and cross-community discussions happen) and the PTG (where the actual development work is being organized.) More info
  • The Forum – The entire OpenStack community (users and developers) gathers to brainstorm the requirements for the next release, gather feedback on the past version and have strategic discussions that go beyond just one release cycle. The OpenStack Foundation offers a Travel Support Program to help cover travel expenses. More info

If your company is unable to pay for you to attend the various OpenStack events, you can apply for the OpenStack Foundation’s travel support program.


The post Tips for contributing to OpenStack appeared first on OpenStack Superuser.

by Mike Perez, Kendall Nelson and Ildiko Vancsa at January 29, 2018 04:43 PM

Adam Young

Creating an Ansible Inventory file using Jinja templating

While there are lots of tools in Ansible for generating an inventory file dynamically, in a system like this, you might want to be able to perform additional operations against the same cluster. For example, once the cluster has been running for a few months, you might want to do a Yum update. Eventually, you want to de-provision. Thus, having a remote record of what machines make up a particular cluster can be very useful. Dynamic inventories can be OK, but often it takes time to regenerate the inventory, and that may slow down an already long process, especially during iterated development.

So, I like to generate inventory files. These are fairly simple files, but they are not one of the supported file types in Ansible. Ansible does support ini files, but the inventory files have maybe lines that are not in key=value format.

Instead, I use Jinja formatting to generate inventory files, and they are pretty simple to work with.

UPDATE: I jumped the gun on the inventory file I was generating. The template and completed inventory have been corrected.

To create the set of hosts, I use the OpenStack server (os_server) task, like this:

- name: create servers
    cloud: "{{ cloudname }}"
    state: present
    name: "{{ item }}.{{ clustername }}"

    image: rhel-guest-image-7.4-0
    key_name: ayoung-pubkey
    timeout: 200
    flavor: 2
      - "{{ securitygroupname }}"
      -  net-id:  "{{ osnetwork.network.id }}"
         net-name: "{{ netname }}_network" 
      hostname: "{{ netname }}"
  with_items: "{{ cluster_hosts }}"
  register: osservers

- file:
    path: "{{ config_dir }}"
    state: directory
    mode: 0755

- file:
    path: "{{ config_dir }}/deployments"
    state: directory
    mode: 0755

- file:
    path: "{{ cluster_dir }}"
    state: directory
    mode: 0755

- template:
    src: inventory.ini.j2
    dest: "{{ cluster_dir }}/inventory.ini"
    force: yes
    backup: yes

A nice thing about this task is, whether it is creating new server or not, it produces the same output, which is a json object that has the server data in an array.

The following template is my current fragment.

{% for item in osservers.results %}
{{ item.server.interface_ip }}
{% endfor %}

{% for item in osservers.results %}
[{{ item.server.name }}]
{{ item.server.interface_ip  }}

{% endfor %}

{% for item in osservers.results %}
{% if item.server.name.startswith('idm')  %}
{{ item.server.interface_ip  }}
{% endif %}
{% endfor %}

ipa_server_password={{ ipa_server_password }}
ipa_domain={{ clustername }}
deployment_dir={{ cluster_dir }}
ipa_realm={{ clustername|upper }}
ipa_admin_user_password={{  ipa_admin_password }}
ipa_forwarder={{ ipa_forwarder }}
lab_nameserver1={{ lab_nameserver1 }}
lab_nameserver2={{ lab_nameserver2 }}

I keep the variable definitions in a separate file. This produces an inventory file that looks like this:








My next step is to create a host group for all of the nodes (node0 node1) based on a shared attribute. I probably will do that by converting the list of hosts to a dictionary keyed by hostname, and have the name of the groups as the value.

by Adam Young at January 29, 2018 12:31 AM

January 28, 2018

Adam Young

Getting Shade for the Ansible OpenStack modules

When Monty Taylor and company looked to update the Ansible support for OpenStack, they realized that there was a neat little library waiting to emerge: Shade. Pulling the duplicated code into Shade brought along all of the benefits that a good refactoring can accomplish: fewer cut and paste errors, common things work in common ways, and so on. However, this means that the OpenStack modules are now dependent on a remote library being installed on the managed system. And we do not yet package Shade as part of OSP or the Ansible products. If you do want to use the OpenStack modules for Ansible, here is the “closest to supported” way you can do so.

The Shade library does not attempt to replace the functionality of the python-*client libraries provided by upstream OpenStack, but instead uses them to do work. Shade is thus more of a workflow coordinator between the clients. Thus, it should not surprise you to find that shade required such libraries as keystoneauth1 and python-keystoneclient. In an OSP12 deployment, these can be found in the rhel-7-server-openstack-12-rpms repository. Thus, as a prerequisite, you need to have this repository enabled for the host where you plan on running the playbooks. If you are setting up a jumphost for this, that jumphost should be running RHEL 7.3, as that has the appropriate versions of all the other required RPMs as well. I tried this on a RHEL 7.4 system, and it turns out it has too late a version of python-urllib3.

Shade has one additional dependency beyond what is provided with OSP: Munch. This is part of Fedora EPEL and can be installed from the provided link. Then, shade can be installed from RDO.

Let me be clear that these are not supported packages yet. This is just a workaround to get them installed via RPMs. This is a slightly better solution than using PIP to install and manage your Shade deployment, as some others have suggested. It keeps the set of python code you are running tracked via the RPM database. When a supported version of shade is provided, it should replace the version you install from the above links.

by Adam Young at January 28, 2018 06:55 PM

January 26, 2018

OpenStack Superuser

Running an OpenStack cloud? Go to the next Operators Meetup

If you run an OpenStack cloud, attending the next Ops Meetup is a great way to swap best practices and share war stories.

This time around, it’ll be held over two days, March 7-8, in Tokyo at the Granpark Conference in the Shinagawa neighborhood.

You still have time to influence the sessions – so check out the Etherpad, where you’ll also find hotel info. The two days will be broken into tracks — general, NFV and enterprise — and suggestions so far include Cinder design for NFV, OpenStack SDKs, LTS releases and OpenStack on containers.

Tickets, limited to 150 participants, cost $USD20. (Previous editions focused on providing input for upcoming releases, however with the addition of the Forum, ops folks are invited to collaborate with OpenStack upstream developers to share feedback and shape upcoming releases at the Summit.)

Got questions? Reach out to the OpenStack Ops mailing list or get involved in the planning by participating in the weekly planning meetings on IRC, which take place on Tuesdays at 1400 UTC.  If you need a visa to travel to Japan or have other travel-related questions, head over to the logistics Wiki.

And for what to expect from an Ops Meetup, check out these write-ups from previous editions held in Manchester and Mexico City.

The post Running an OpenStack cloud? Go to the next Operators Meetup appeared first on OpenStack Superuser.

by Superuser at January 26, 2018 02:37 PM

January 25, 2018

Adam Young

Using JSON home on a Keystone server

Say you have an AUTH_URL like this:

$ echo $OS_AUTH_URL 

And now you want to do something with it.  You might think you can get the info you want from the /v3 url, but it does not tell you much:

$ curl $OS_AUTH_URL 
{"version": {"status": "stable", "updated": "2016-10-06T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.7", "links": [{"href": "http://openstack.hostname.com:5000/v3/", "rel": "self"}]}}[ayoung@ayoung541 salab]$

Not too helpful.  Turns out, though, that there is data, it is just requires the json-home accepts header.

You access the document like this:

curl $OS_AUTH_URL -H "Accept: application/json-home"


I’m not going to past the output: it is huge. 

Here is how I process it:

curl $OS_AUTH_URL -H "Accept: application/json-home" | jq '. | .resources '

Will format somewhat legibly.  To get a specific section, say the endpoint list you can find it in the doc like this:

 "http://docs.openstack.org/api/openstack-identity/3/rel/endpoints": {
 "href": "/endpoints"

And to pull it out programatically:

curl -s $OS_AUTH_URL -H "Accept: application/json-home" | jq '. | .resources | .["http://docs.openstack.org/api/openstack-identity/3/rel/endpoints"] | .href'

by Adam Young at January 25, 2018 03:42 PM

OpenStack Superuser

Carnegie Mellon’s clear view on 5G cloudlets

Emerging edge cloud services like smart city applications have suffered due to lack of acceptable APIs and services that can dynamically provision and deliver services through provider edges. Except for caching services widely used by providers, due to economy of scale in cloud and few low traffic gaming apps there is little to cheer about.

A talk about the future of edge at the OpenStack Summit in Sydney was itself something of an evolving situation, with the originally scheduled for presentation by Prakash Ramchandran, Rolf Schuster, and Narinder Gupta, ended up presented by Mark Shuttleworth with Canonical and Joseph Wang from Inwinstack due to none of the original presenters able to travel due to visa problems.

The good news: it’s only a matter of time before there’s another generational wave of the radio frequency spectrum as the industry moves to 5G. “That will create a new set of possibilities, but it’s also going to create a huge amount of cost,” said Shuttleworth. “It’s going to be very expensive to deploy that next generation radio frequency.” 5G will enable very high speed, very low latency communication.

“What kind of killer applications could you create in a world where you could have compute that’s very, very close to a mobile device or very, very close to a car or very, very close to somebody walking around,” asked Shuttleworth. “What sort of applications would be interesting?” That, he said, is the heart of the research project at Carnegie Mellon University (CMU), where people are already creating a developer ecosystem to form the basis of the next generation of killer apps. On the other hand, he said, there are also people looking at operating standards and procedures as well as the economics around this new emerging technology.

Shuttleworth believes that this will emerge as a class of computing unto itself, in a line that started with desktops, then mobile apps, then data centers and large-scale clouds. The apps that emerged in each type of computing class were only capable with each new class — none could have existed in the previous era alone. “I tend to be more interested in the stuff that the Carnegie Mellon guys are interested in, which is essentially saying, ‘What kind of developers do we need to attract in order to get killer applications for edge computing?’

The technology, said Shuttleworth, will have a latency of less than 50 milliseconds. “Well, 50 milliseconds is I think 20 times a second,” he said. “So anything that you need to refresh 20 times a second (will become) things that require near real time feedback, effectively.”

An example is augmented reality.  “If you want to essentially provide somebody with visuals that are overlaid on the real world,” said Shuttleworth, “then you need to refresh that 20 times a second and not be more than 50 milliseconds behind, effectively. So then you need an architecture that looks like this kind of edge cloud-type architecture.”

There are many other technologies, like the Internet of Things, that will also need to be part of the connected 5G ecosystem. “There’s no suggestion that these clouds are somehow decoupled from the internet or from public clouds,” Shuttleworth said. “But there’s a very strong suggestion that what they have to do is have to be relatively tightly coupled to the end user.”

People will have mobile agents, said Shuttleworth, whether that’s a cell phone, virtual reality goggles, augmented reality, a self-driving car, a robot, or a drone. “You want to minimize the amount of compute that you’re doing on that drone,” he said. “Energy is very precious for a drone or mobile phone.” There are heavy-duty applications that run on a phone, for sure, but they tend to use up the battery. If you can get the applications off the phone with minimal latency, you could save battery life.

Discovery of such devices is an important primitive, said Shuttleworth. “You initiate a session with a cloudlet. You would offload work to that cloudlet; you now have a relationship with that cloudlet. That cloudlet is less than 50 milliseconds away from you in the network, so you can have a very high or low latency, high bandwidth-type conversation with that digital twin or compute (in) whatever form it is.”

But your mobile device will be moving, so it might need to migrate to a different cloud. “So this starts to stress and exercise OpenStack in new and unusual ways,” said Shuttleworth. “We’ve all done a lot of work with live migration in the context of a cloud. For example, you’ve got a hypervisor, you’ve got a bunch of VMs on that hypervisor, you need to reboot that hypervisor. It’s better if you can live migrate the VMs off that hypervisor.”

You’ll be migrating within the cloud, which assumes you have a consistent low-cost, high-bandwidth network interconnect. While it’s easy to migrate between two rack servers, it’s not so simple when you’re migrating between two cloudlets that might have a significant distance between them. “So a lot of the work that Carnegie Mellon has done is in optimizing, effectively, that process,” said Shuttleworth. “And the way I would characterize that is that they’ve essentially brought together a lot of the thinking that you’d find inside something like Docker with, basically, VM operations.”

In the world of Docker, it’s normal to use layered file systems. For example, you can have two developers get the same copy of Ubuntu, and any changes made to one can easily be created in the other, just via the delta. In the world of edge networks, however, this type of delta sharing is done through some clever block-level primitives which compress and stream the changes in a short period of time. That allows for live migration of VMs in an efficient way over low-bandwidth environments.

That’s the concept, anyway — enabling a mobile agent (car, phone, drone, other) to discover a cloudlet, unstantiate capability there, and then efficiently translate it to a completely different cloudlet. “And then that is supposed to be the backdrop against which applications can emerge,” said Shuttleworth.

Joseph Wang took over the presentation with a high-level multi-access edge computing (MEC) schematic from the European Telecommunications Standards Institute (ETSI), which provides standards for all telecoms operating in the edge computing arena. “The ETSI also published a specifications API for people wanting to work applications on top of edge infrastructures,” said Wang. “So it lets the developer or the cloud service provider host the application on the edge gateway IT servers.” Developers need to follow this high-level API specification when creating applications on top of edge infrastructure.

Carnegie Mellon University (CMU) researchers have a branch of OpenStack that extends the platform to add some key capabilities, the first of which is the discovery of the aforementioned cloudlets. “So you imagine an agent running on your phone,” said Shuttleworth. The current implementation of this is on Android: you install an APK onto an Android phone, which is then able to discover cloudlets. The CMU campus has two cloudlets running over a prototype 5G network and the researchers there have found a way to effectively describe how to assemble an image from known parts, like Windows or ubuntu, and then apply delta effects to get the image for a specific app on your phone. “(There’s) a language to essentially agree on things like resource allocations and network addresses and so on to essentially set up the session.” he continued.

Then the VM migration primitives — all of it open-source — move these images around to keep the lag under 50 milliseconds.

And then the VM migration primitives, basically moving images around so that you can always stay sub 50 milliseconds. And they’ve demonstrated that … I think their target is sort of DSL-type speeds. The wanna be able to support DSL-type speeds between cloudlets. All of this is open source.

What Inwinstack has done is upgraded Cloudlet (OpenStack++) from Kilo to Pike. Due to VM data compression, said Wang, things migrated quickly. Inwinstack does a lot of API work with different technology included in OpenStack, like Nova, Neutron, and Horizon. “If the OpenStack community people want to join,” said Wang, “we are more happy to ask you to join to work with us.”

Canonical, the company behind Ubuntu, got involved when Shuttleworth met CMU’s Professor Satyanarayanan (who goes by the name Satya) at 2015’s Tokyo OpenStack Summit. Shuttleworth was intrigued by the possibility of a new class of application that might emerge at the edge. “The engagement that we’ve had with his research team have mainly been around thinking about the operational consequences of having potentially tens of thousands of these cloudlets,” said Shuttleworth. “Because of the explicit latency target, you know that you have to have lots and lots and lots of them in order to cover a country, and that just means that you’re going to have to think about operating them in a totally automated way.”

Most of the work with OpenStack, he said, is typically with ten to twenty OpenStacks, but not ten to twenty thousand. So far, that’s something that’s truly difficult to achieve with OpenStack.

In addition, the ratio of cloudlet overhead to actual compute capacity becomes important. If there are only three, eight, or ten nodes, for example, two nodes of overhead is a much bigger chunk of the overall capacity than it would be if you were working with 200 node, said Shuttleworth. “So a lot of our interest is in figuring out ways to make this whole thing operable and efficient so that at scale, the economics will be clean,” he said.

One way to scale is with an interesting containerization. “We’ve been working with the CMU guys to show how you can do all the same things with a container rather than with a VM,” Shuttleworth said. “Now, the one thing that you can’t do with a container is run a Windows workload, so they have some great demos, which are literally Windows, like a Paint application that follows you around in the car. But for cases where your workload is a Linux workload, a container is going to be a much more efficient way to essentially distribute those workloads and then get more value out of that distributed compute.”

There are two ways the teams approach this. “One is LXD, which is more VM-like and has many of the same type of primitives you’d see in a VM of CentOS or Ubuntu and Kubernetes,” said Shuttleworth, “which is a sort of Docker orchestration capability that you can’t avoid hearing about at the Summit.”

The final piece of the puzzle, he said, is straight Bare Metal. “There are certain workloads where actually you might want to constitute for Bare Metal capability for that offload or pass through Bare Metal,” said Shuttleworth. “A lot of the augmented reality applications that are being developed depend on having access to GPGPU-type capabilities. So we’ve been working with the CMU guys to enable access to either raw networking or GPGPU.”

The goal, said Shuttleworth, is to enable any institution to set up cloudlets inexpensively. “If you just have five PCs, six PCs, you want to be able to press a button and have a cloudlet,” he said. “And then the next step would be essentially to enable different institutions to start collaborating so that apps can migrate effectively between cloudlets from different institutions.”

The start of that kind of future is in Pittsburgh at Carnegie Mellon. “They have two different cloudlets,” said Shuttleworth. “They have the beginnings of the radio frequency back-hold to support that. But I think they’re actively interested in getting more institutions with more diverse perspectives and specific interests from the point of view of the applications in particular to participate.”

The applications he’s seen are very interesting. They have one project between a telco and a firm of architects as well as one firm that works with drones. The project basically monitors a construction project just across the road from the university, said Shuttleworth. “They essentially are flying drones around and then comparing in real time the state of the building with the construction plans, effectively, so that they can very cheaply sign off on the quantity surveying parts of the construction project.”

There are other projects, including one in industrial automation, where someone wearing a pair of goggles can walk onto an industrial site and see exactly what valves they need to turn and other processes that need to happen without any previous knowledge. ”The idea is to be able to detect exactly where they are and then overlay in their field of vision exactly the sequence of instructions,” said Shuttleworth.

Carnegie Mellon is the place it’s happening. “It’s going to be much easier to make the business case for 5G deployments if you have a real concrete sense of the apps that could exist in that world that are impossible in the world that we have today,” concluded Shuttleworth.

Check out the whole presentation below.

The post Carnegie Mellon’s clear view on 5G cloudlets appeared first on OpenStack Superuser.

by Rob LeFebvre at January 25, 2018 03:40 PM

Andrea Frittoli

Cross-service Tempest testing – Devstack

At the OpenStack summit in Sydney, I ran a workshop about writing a Tempest plugin for cross-service integration testing.
This is the first of a series of posts where I will cover the material from the workshop. To make this more practical for people to use, I will also include examples on how to integrate this kind of test in an OpenStack CI job. I expect to split this into four posts, roughly covering the following topics:

  • Introduction and how to set up the test environment using Devstack (this post)
  • How to create a Tempest plugin from scratch
  • How to write a Tempest scenario test for cross-service integration testing
  • Writing an OpenStack CI job in Zuul V3 native format

This series of posts assume you have a basic familiarity with OpenStack testing infrastructure and Tempest.
The infra user manual and Tempest docs can provide more background information.


As part of the “Technical Committee Vision for 2019”, the OpenStack TC introduced the idea of “constellations”, which are reference sets of OpenStack projects, defined to help users get started with OpenStack.

The idea of the workshop was to demonstrate what Tempest plugins are and how they can be used and combined together to test specific OpenStack constellation. Since a lot of (testing) code starts as copy/paste from an example, I wanted to have a good reference example for test developers to re-use.

Scenario Description

I set-up a fictitious constellation for users who want to use the compute service with DNS. It includes the standard compute set: keystone, nova, neutron, glance and cinder, plus Designate for DNS services and Heat for Orchestration.
Nova, Neutron and Designate integration is well documented, however, there are no integration tests written to validate it. Using this constellation would give me the opportunity to build the workshop demo and also write something actually useful.

Test Scenario Diagram
Test Scenario (CC-BY 2018 Andrea Frittoli)

Setting up the test environment

The first thing I needed was a cloud running all the services I wanted, so I could fire tests against it. I needed the smallest possible footprint so that before the workshop I could spawn one cloud for each workshop attendee. I decided to use devstack, which is OpenStack development and test environment. I chose OpenStack version Pike, which at the time of writing was the latest stable OpenStack release.

Configuring devstack for my purposes was easy enough, using a combination of Neutron and Designate configuration.
What I did was:

  • Disable Horizon and Swift to reduce the footprint
  • Enable heat and designate by loading the corresponding devstack plugins
  • Configure the DNS driver and DNS domain in Neutron
  • Set Designate specific settings in Neutron

Which resulted in the following changes:

# Use bind as backend for Designate

# Disable swift and horizon
disable_service horizon
disable_service s-proxy s-object s-container s-account

# Enable Heat
enable_plugin heat https://git.openstack.org/openstack/heat

# Enable Designate
enable_plugin designate https://git.openstack.org/openstack/designate






The only issue I encountered was with the Designate backend. The version of PowerDNS on Ubuntu Xenial does not match the existing driver, so it cannot be used until the driver is re-written. The default has since been changed to bind9 on stable/pike, so the extra configuration change should not be needed anymore.

The full configuration file is available here.

With devstack configuration ready, to stand up a test cloud I only need to run stack.sh. Configuration settings are then propagated into service configuration files as well as test configuration files (where relevant). The devstack plugin interface defines a test-config interface that can be used to configure the corresponding plugin or even core Tempest settings.

Create a workshop test image

I wanted the devstack setup to be easily reproducible so I could easily rebuild the test environment at any time. I also wanted to give workshop attendees an easy way to set up the test environment and play re-run the workshop if they wanted to.
I wrote the automation in Ansible – the full code is available on GitHub.

The playbook provisions a virtual machine (VM) in an OpenStack cloud using Ansible OpenStack cloud modules. It then adds the created VM to ansible inventory dynamically:

  - hosts: localhost
      - name: Delete existing workshop VM
          cloud: "{{ cloud_profile }}"
          name: workshop_base
          state: absent
        when: force_new_vm
      - name: Setup a VM to create the workshop image
          state: present
          cloud: "{{ cloud_profile }}"
          name: workshop_base
          image: "{{ base_image_name }}"
          key_name: "{{ cloud_keyname }}"
          timeout: 200
          flavor: "{{ flavor_name }}"
        register: vm
      - name: Add the VM to the inventory
          name: "{{ vm.openstack.interface_ip }}"
          group: cloud
      - name: Wait for the VM to become ssh-able
          port: 22
          host: "{{ vm.openstack.interface_ip }}"
          search_regex: OpenSSH
          delay: 10
          sleep: 5
          timeout: 120

Once the VM is provisioned and ssh-able, it runs a number of roles against it to prepare it for Devstack and to prefetch relevant repositories. The script “stack.sh” will be executed at the first boot of the VM thanks to a systemd unit file:

Description=A one shot service - run stack.sh as stack at boot

ExecStart={{ BASE }}/workshop/stack_service.sh start
ExecStop={{ BASE }}/workshop/stack_service.sh stop


Finally, a new snapshot is taken of the VM and the old snapshot (if any) is deleted.

Reconfiguring the test environment

I wanted people attending the workshop to be able to configure their own DNS domain for the exercise. However, running stack.sh is time-consuming (20+ minutes) and I did not want to spend that time during the workshop. To solve this I asked attendees to change the configuration directly in Neutron:

# Edit the config file
vim /etc/neutron/neutron.conf
# Set dns_domain. The new domain must end with a '.'

# Save and close. Restart all neutron services
sudo systemctl restart devstack@q-*

# check the status of a service, you can:
sudo systemctl status devstack@q-svc

# check logs, use journactl. `-a` for whole logs, `-f` to tail:
sudo journalctl [-a|-f] --unit devstack@q-*

# Setup the openstack client to use the test cloud
export OS_CLOUD=devstack
openstack image list

# Perform admin operations
openstack --os-cloud devstack-admin service list

In the next blog post I will explain how to create a new Tempest plugin.

by Andrea at January 25, 2018 11:24 AM

January 24, 2018

OpenStack Superuser

New ways to build and test additions to OpenStack

When you’re building software for the OpenStack ecosystem, a key challenge is doing integration testing: you need to be able to get a working, “clean” and up-to-date install of OpenStack as quickly and easily as possible. To help solve this problem, CloudLab, OpenLab, and the OpenStack Foundation are teaming up to provide the OpenStack community with powerful new ways to build and test additions to the OpenStack software ecosystem.

CloudLab is a facility funded by the National Science Foundation to support work on the fundamental architectures of cloud computing and the new applications they enable. CloudLab is not, itself, a cloud: it’s a laboratory environment that gives researchers the ability to build their own clouds by controlling and monitoring the environment at a lower level than typical cloud software stacks.

For instance, the “hello world” of CloudLab is to create a new OpenStack cloud from scratch. The process takes about 10 minutes, and at the end, the user has complete control of their own private OpenStack cloud, allowing full administrative rights and root privileges to monitor, modify and replace any part of the system. This includes OpenStack itself, the hypervisor, the networking and storage systems and any software built on top.

CloudLab has about 1,000 machines in three data centers at University of Utah, University of Wisconsin – Madison and Clemson University. It offers bare-metal hosts, an Ethernet network with OpenFlow 1.3 throughout, isolated storage (HDD, SSD, and NVMe), and cross-country Layer 2 networking through Internet2.  The CloudLab sites are in the process of adding hundreds more hosts, which will feature top-end GPUs, field-programmable gate arrays, programmable network interface controllers and user-imagable (ONIE) switches.

OpenLab is a community-led program to test and improve support for the most popular software development kits (SDKs)—as well as platforms like Kubernetes, Terraform, Cloud Foundry and more—on OpenStack. The goal is to improve the usability, reliability and resiliency of tools and applications for hybrid and multi-cloud environments.

Huawei and Intel are both contributing full-time contributor resources and infrastructure to the project at its launch and Open Telekom Cloud and VEXXHOST are providing OpenStack-powered public cloud infrastructure for testing. OpenLab is in the formative stages and looking for additional contributors and feedback.

CloudLab is partnering with OpenLab to make these resources available to projects in the OpenStack ecosystem that come from academia and/or have research or education aspects.

If you think you may qualify under this criteria, please see the CloudLab acceptable use policy at https://cloudlab.us/aup and contact info@openlabtesting.org for more details on getting started.


Superuser is always interested in community content, get in touch at editorATsuperuser.org


Cover Photo // CC BY NC

The post New ways to build and test additions to OpenStack appeared first on OpenStack Superuser.

by Melvin Hillsman and Robert Ricci at January 24, 2018 02:17 PM

Lars Kellogg-Stedman

Safely restarting an OpenStack server with Ansible

The other day on #ansible, someone was looking for a way to safely shut down a Nova server, wait for it to stop, and then start it up again using the openstack cli. The first part seemed easy:

- hosts: myserver
    - name: shut down the server
      command: poweroff
      become: true …

by Lars Kellogg-Stedman at January 24, 2018 05:00 AM

January 23, 2018


Aptira joins the Linux Foundation Networking Project

The big news today is that the Linux Foundation has launched its Linux Foundation Networking (LFN) project that brings together several of its network-oriented projects like ONAP, ODL and OPNFV into one governance unit. This increases efficiency, fosters project synergies, accelerates adoption and dramatically increases the level of funding into the associated communities. 

Aptira is pleased to announce that we have signed on as a Gold Sponsor of the Linux Foundation Networking project. 

What does this mean for Aptira? 

For several years a chunk of Aptira’s business has been driven by the Telecommunications sector. Our years of OpenStack expertise have been shaped by the need to have OpenStack meet the standard of Telco grade, so we’ve never been afraid to say “no” where there was any doubt when something didn’t meet that grade. We probably lost a bit of work with that positioning but what we gained was far more valuable. Now widely regarded as a “Trusted Advisor” to our customers, we are very fortunate to lead some fantastic cutting edge and transformative projects. As the old saying goes, we’ve done this by teaching our customers to fish, rather than handing them a fish. 

At the front of this cutting-edge Telco work came the ONAP project. It quickly became obvious that the skill set our team had developed in OpenStack and the technologies that surround it were well suited to us very quickly making ONAP something we could confidently call ourselves proficient in. 

Why the Linux Foundation Networking project? 

In 2013 Aptira became a Gold Member Sponsor of the OpenStack Foundation. It took a huge effort back then for us to meet the criteria to achieve that Gold status and since then we did our best to provide our unique vendor independent and operations focused voice to the community. With the LNF, we find ourselves once again able to provide an Ops focused independent voice. We also have the same excitement about it as when we dived into the OpenStack Foundation! As we did with OpenStack, we bring regional, scale and organisational diversity to LFN, and diversity will undoubtedly benefit the organisation and the projects within. 

But what about OpenStack? 

In short, we remain fully committed to OpenStack. OpenStack has been a strong part of our business and will be in the future and we’ve enjoyed playing a part in its governance. We feel it’s now time to pass on the governance baton and make way for the next iteration of its lifecycle.  With the Sydney Summit last November our years of organising the community here in Australia reached a climax too. Jessica, Kavit, Roland and myself have all had some fun on the Board, and with all 24 Gold spots full we felt it was time to make way for someone else to get their Gold badge. 

Most importantly, we see ONAP as very complementary to OpenStack and our team will most certainly keep participating in OpenStack in the community and technically. We will still be holding events like our recent OpenStack Days in Sydney, Melbourne and Canberra with the next one coming up in June, and OpenStack will be a part of that. Our pragmatic approach to OpenStack has seen us outmanoeuvre and outlast many other organisations, and we are confident of maintaining our leading position in APAC and continuing to grow our OpenStack business. 

The Future 

We’re excited about what 2018 brings both immediately and beyond. Working with technologies and communities we know well will allow us to offer customers the trusted counsel and technical expertise we’re known for, and we’re particularly excited to be joining our awesome friends like Cloudify, Inocybe and SUSE founding the LNF. We look forward to seeing you at the various events we’ll be attending and hosting this year, and we encourage you to get in touch with any projects, or with any queries about the above. 

The post Aptira joins the Linux Foundation Networking Project appeared first on Aptira.

by Tristan at January 23, 2018 10:32 PM

Chris Dent

TC Report 18-04

When a person is in early adolescence they get cramps in their legs and call it growing pains. Later, in adulthood, there's a different kind of pain when the strategies and tactics used to survive adolescence are no longer effective and there's a chafing that won't subside until there's been a change in behavior and expectations; an adaptation to new constraints and realities.

Whatever that is called, we've got it going on in OpenStack, evident in the discussion had in the past week.

OpenStack-wide Goals

There are four proposed OpenStack-wide goals:

These need to be validated by the community, but they are not getting as much feedback as hoped. There are different theories as to why, from "people are busy", to "people don't feel empowered to comment", to "people don't care". Whatever it is, without input the onus falls on the TC to make choices, increasing the risk that the goals will be perceived as a diktat. As always, we need to work harder to have high fidelity feedback loops. This is especially true in our "mature" phase.

Interop Testing

Despite lots of discussion in email and on the review, the effort to clarify how trademark and interop tests are to be managed remains unresolved. Some discussion today explored whether there is an ordering problem.

I find the whole thing very confusing. People who care about trademark tests should write and review any new ones in a trademark repo that hosts the trademark tempest plugin. Existing tests should migrate or be copied there as time allows. Then the trademark tests have a single responsibility and a single home and we don't have to think so much. People imply that this is crazy, and yes, it requires some effort and has some duplication, but doesn't everything?

Scope of OpenStack Projects

Last Thursday, dhellman started a conversation about what makes an OpenStack project, prompted by Qinling's application to be "official". The adult reality here is stated pretty clearly by Doug:

we used to have 2 options, yes or no. Now we have yes, no, and "let us help you set up your own thing over here"

To some extent gatekeeping projects is the main job of the TC, and now we've made it a bit more confusing.

PTL Balance

In this morning's office hours we had a discussion about ways to help the PTL role (especially of the larger and most active projects) be more manageable and balanced. The main challenge is that as currently constituted, the person in the PTL role often needs to keep the state of the whole project in their head.

That's not sustainable.

by Chris Dent at January 23, 2018 06:45 PM

OpenStack Superuser

Want to go cloud native? Start with the right people, says author

Any organization can move to cloud native, says Justin Garrison, senior systems engineer at Disney, if they start with the right people.

“You can’t just tell everyone we’re going to going use this new thing, go have at it. Because they’re going to do the same things they did before this environment came along or try to work their way around it,” says Garrison, co-author of “Cloud Native Infrastructure: Patterns for Scalable Infrastructure and Applications in a Dynamic Environment” speaking on Cisco’s “Cloud Unfiltered” podcast.

And while challenges abound, there are reasons to make the effort to move into this new territory.

“People just don’t scale, people make mistakes and people aren’t good at doing the same thing over and over again,” he says. “Whereas you can have declarative things set in policy and you know how your infrastructure should look that the computer….(and) the computer will do that over and over again and be really good at it.”

It’s also about a two-way relationship with infrastructure, he says giving credit to co-author Kris Nova for the concept of infrastructure-as-software — rather than infrastructure as code. In a cloud native environment, you want software that’s managing it. In other words, it’s not just a one-way where code is pushed and then something happens. But the interface mutates how the infrastructure looks on both sides, he adds.

Garrison got started in tech in college, while “babysitting labs” and in open source around the same time through a second-hand computer and Linux. Now heavily involved in Kubernetes and CNCF, he offered some advice about getting started in open source communities.

“It doesn’t matter it doesn’t matter what your skill sets are, it matters that you can provide time,” he says. “If you can offer time, an hour a week, half an hour a day, just go troll through the forums or GitHub repos and find a need.”

You can watch the whole 41-minute episode on YouTube or catch Garrison’s talk on cloud native at the upcoming SoCal Linux Expo.

The post Want to go cloud native? Start with the right people, says author appeared first on OpenStack Superuser.

by Superuser at January 23, 2018 02:00 PM

January 22, 2018

OpenStack Superuser

Challenges and battle scars from OpenStack deployments

Sometimes,  helping a large enterprise client choose and implement OpenStack is a real conquest.

DXC’s Anupriya Ramraj and Rick Mathot were joined by American Airlines’ Farhad Sayeed at the OpenStack Summit in Sydney to talk about the toughest battles (and a success story) in building OpenStack deployments for enterprise clients.

The first challenge is helping a client choose either a distribution or upstream OpenStack. Ramraj recommends the distribution from multiple vendors. Even then, it’s important to make that infrastructure needs of the company match up with the right distribution. “You might end up with a load balance that’s not supported with a particular version of OpenStack that’s supported by the distribution,” she said. “Just getting that mix right, and delays because of that, is something that we see as a challenge.”

The next trick is dealing with getting OpenStack on older hardware. “It’s got its own unique challenges when you’ve got to go look up those drivers, get everything updated,” Ramraj said. “It can easily add up to months of delays there.”

Even once you’ve gotten OpenStack deployed, there’s bound to be a chaotic incident at some point. “Trouble-shooting OpenStack does require some unique knowledge of OpenStack services and understanding the dependencies between the services,” she said.

Getting help from the distribution provider can be tricky, too, as just reproducing the error or finding the right log files requires OpenStack expertise, which your client may or may not have in-house. “OpenStack skills are tough to hire and they’re tough to retain,” said Ramraj. “The audience out here, you should be proud that you’ve got OpenStack on your resume. It makes you highly valuable, because it is a tough skill to get out there in the market.”

Once all this ground has been covered, you still need to win over the apps group. “It’s always a tug-of-war between apps and infra(structure),” said Ramraj. Apps owners can be reluctant to move to OpenStack and the business case for OpenStack is getting those apps to successfully run. Sometimes, app teams want to use newer app deployment methods, and convincing them that Ansible and containers work on OpenStack is another challenge.

“Is it a Heat API to do the infrastructure automation, is it Kubernetes on OpenStack,” Ramraj asked. “Winning the challenge, winning the mind share of the apps guys, is a unique challenge by itself.”

Assuming all of the previous issues have been solved — infrastructure, support, apps, production workloads — there will come a time where you must need to OpenStack. “Upgrades in recent years have gotten much less intrusive,” said Ramraj, “much more dependable, but still there’s some unique challenges.”

Using managed services for reinforcement

All of these issues can be mitigated with a good managed services provider, said Ramraj. The client does not need to do this alone. A provider brings two things to the table, including the initial consult on the design (which involves choosing the right distribution and architecture, setting up networking and storage block objects, for example).

Secondly, a good managed services provider can offer ongoing services, which can look a lot like traditional IT services management.  “OpenStack does not exist in an island,” said Ramraj. “OpenStack exists in tandem with other services that the client is consuming.” For example, a large enterprise client will likely have to bill individual departments for OpenStack consumption. Setting the billing up correctly is a solid task for a provider.

Another is incident management. A good managed service provider will offer 24/ 7 operations with highly skilled OpenStack people who can help with capacity management, storage optimization, or growth-based compute scaling.

Different clients will need different hand-offs, too, in terms of managing the OpenStack deployment. “If you look at this as a layered graph,” said Ramraj, “typically we say the managed services provider will handle up to the OpenStack layer and then the client owns the retro machines and the applications that they onboard onto the stack.”

Of course, every client is different. Some might only want to own the business outcome, which means the provider manages even the VMs and app onboarding. In addition, handling the VMs might prove another challenge. “One client wanted to back up all those VMs and the rate of the backups and the frequency they wanted to do it caused some issues in terms of configuring Neutron to be able to deal with that network bandwidth,” said Ramraj. The key, however, is having the right conversations between client and provider on how to manage the RACI, or responsibility assignment matrix.

People are key to the whole process, too. There are many roles that are important for a managed services provider to use when implementing OpenStack for a client: the cloud ops support team and the dev ops cloud engineer. The former needs to be a highly skilled OpenStack specialist, not a generalist, while the latter is essential for making the OpenStack implementation work for the client, with apps to support business outcomes. Other roles include account delivery executives and the cloud service delivery managers. “The delivery managers enable governance roles in there, making sure they have regular conversations with the client around, ‘Hey, what’s your forward-looking capacity planning, how are we doing on handling your tickets,’ and making sure the governance is in there.”

Ultimately, Ramraj believes that the battle for OpenStack private clouds can be won. DXC’s work with American Airlines is a prime example as the two worked together to get a critical baggage handling app set up on OpenStack.

Winning the OpenStack battle at American Airlines

American Airlines’ Farhad Sayeed told how his company has been virtualizing things for more than 11 years. Two years ago, they  started with a vision to create a hybrid solution with an internal data center and a public cloud. “During our journey,” said Sayeed, “our focus was to raise open-source usage across the enterprise. We already do quite a few, but we want to focus on them. The future of hosting must support our business agility, some portability, and DevOps culture.”

The journey was admittedly bumpy. First, networking presented a challenge. “We had to work very closely with our network team to set up a repeatable process where we can duplicate that over and over again in various environments,” he said. “In a large enterprise like ours, OpenStack doesn’t stand by itself. It has to integrate with existing networking, backup restore, monitoring, SLAs and so on.” Sayeed’s team ended up testing four different OpenStack distribution types. He recommends finding providers who already have a strong partnership with your company, a solid roadmap and a future.

Originally, American Airlines wanted to create their own platform with networking. “We quickly realized that we were much better off and would get there quicker if we started with the reference architecture and we ended up in a converged platform,” said Sayeed.

American’s OpenStack target workloads standardize on Cloud Foundry for both public and internal clouds. That was the first workload that went into production. “At American Airlines our goal for cloud, both internal and external, is PaaS first, platform as a service,” Sayeed said.

One such app, Bag Tracker, went live recently thanks to the partnership between American and DXC as a managed services provider. Bag Tracker is a web-based app that allows end-to-end tracking, from check-in to delivery, of passenger bags, priority parcels,and aircraft equipment. Their standard Linux distribution today is Red Hat, although his team also sees a rising demand for Ubuntu, which they are testing as well.

Sayeed’s team is still deciding which container to go with. “Various teams are deploying Docker and Kubernetes in various configurations…” he said.

The journey to cloud is a cultural shift, said Sayeed. “Besides all the technical challenges that you will face, you must take care of the culture first,” he said. “All the different silos and disciplines, as we know, in an enterprise, have to be broken down.”

In addition, leadership must support what you’re doing. “My team and I were lucky enough to have full support from all the way from our CIO to my managing director during this journey,” said Sayeed. “That’s important. I cannot stress enough about that. Train and restructure your teams. Just because we work this way today, that doesn’t mean that’s how it needs to be. The cloud journey requires high collaboration. Work hand in hand with networking, security, middleware, application team. Everybody needs to come together.”

Sayeed and American Airlines have been testing four distributions of OpenStack. “We found out that less is more,” he said. “Folks who had done less customization and stayed close to the trunk found OpenStack easy to implement, easy to maintain, easy to upgrade and easy to roll out.”

Finally, bring your operations team in early. “You may have your own IT operations internally. You may have a managed service provider,” Sayeed said. “Don’t wait until you’re ready to go to production to bring them in. Bring them early. You’ll need them, they need you. Partner with them, and partner with them early.”

The toughest battle: Managing client expectations

Rick Mathot runs a global engineering and architecture team that has been deploying private clouds since 2009 (and OpenStack private clouds since 2015) at DXC. While he notes that there are many technical challenged involved in making OpenStack work, his presentation focus was on the human side.

In his keynote, OpenStack CEO Jonathon Bryce noted that OpenStack is no longer a science experiment. Mathot said that two-thirds of deployments are in production. “That means that now is the time that we’re going to be judged on our business value,” said Mathot. “The success of our outcomes will be judged relating to the business. That’s critical, because we need to understand what that value pertains to, (you need to) know the ‘why.’ Understanding your north star, as I call it, is going to get you through those periods where uncertainty prevails.”

It’s even tougher to gain ground when the troops aren’t open to new ideas, said Mathot. “We know how frustrating it is when a business comes to us with a solution and equally and likewise we shouldn’t be coming to them with preconceived solution artifacts (or) jargonized narrative relating to their ask,” he said.

CXOs are usually pretty nervous; business is in a time of significant change and providers need to help them allay their fears with language that they understand. “We’ve got to be able to articulate things simply in terms of what it is,” said Mathot. “For American Airlines, it was a scalable automated platform to host, to perform or provide PaaS to the organization.” In this case, it was the baggage tracking app – it was a simple use case, understood well by the business, and it worked within the OpenStack platform.

Mathot said that partnerships like this exist in a hybrid state — you need to be able to identify all the affected parties, all the stakeholders in a project. “Really, from a human side, bringing them into the tent with you is going to make them a part of the solution and stop some of the problems we’ve seen in the past of creating speed humps just for the sake of that,” he said.

Doing so will ensure that you’ll get support when you need it, which is generally when things are critical. “I think knowing some of the organizations I’ve been into, you may have to put it on a Post-It note and stick it on their screens.” said Mathot. “The reality  is that you’ve got to do what you’ve got to do to get buy-in.” He said you need to lead from the front, don’t stop using mode-two thinking and design principles. “Just always make sure you keep an eye on the rear view mirror,” added Mathot, “to ensure that they’re still with you, the organization is still with you.”

Once you’ve finished the deployment and it’s up and running, you need to quantify what you’ve achieved, even beyond the technical value. “It’s a real simple trick,” said Mathot. “You’ve got to have a story for every level within your organization.” For example, while 20 percent IT efficiency is really great, explaining a 15 percent productivity gain for your application providers is probably a better story for them. Showing a faster speed to market for new useful client-facing apps is what will excite the CXOs, as well. You want to tune your story for each organizational level.

Finally, Mathot cautioned against a Facebook-style “better done than perfect” attitude. “Progress is important and any progress is better than none at all,” he said, “but what you shouldn’t look to do is have that (type of) mantra permeate every level of your deployment.”

Relevance is key, he said. “We all love the challenge and we all want to play with the newest technologies,” said Mathot. “It’s rewarding both personally and professionally, but the (American Airlines) story was a success because your bags arrived at the same airport at the same time as you did. It was useful and it was relevant.”

Catch the whole presentation below.

The post Challenges and battle scars from OpenStack deployments appeared first on OpenStack Superuser.

by Rob LeFebvre at January 22, 2018 09:42 AM

NFVPE @ Red Hat

TripleO AIDE Service

Just a quick post about some patches that have landed in TripleO (upstream Red Hat OSP Director) to deploy AIDE to the overcloud. What’s AIDE? AIDE (Advanced...

by Luke Hinds at January 22, 2018 12:00 AM


Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.


Last updated:
February 23, 2018 12:36 AM
All times are UTC.

Powered by: