February 18, 2020

Carlos Camacho

TripleO deep dive session #14 (Containerized deployments without paunch)

This is the 14th release of the TripleO “Deep Dive” sessions

Thanks to Emilien Macchi for this deep dive session about the status of the containerized deployment without Paunch.

You can access the presentation.

So please, check the full session content on the TripleO YouTube channel.



Please check the sessions index to have access to all available content.

by Carlos Camacho at February 18, 2020 12:00 AM

February 17, 2020

CSC.fi Cloud Blog

CentOS8 images published in Pouta as a tech preview

There are now CentOS-8 images available in Pouta!


There are some minor issues with the upstream CentOS8 images, so, for now, they are considered to be in "tech preview".

We have solved the one we have found so far by temporarily modifying the image to use "cloud-user" and remove the resolv.conf leftovers.

Basic information about our images can be found on docs.csc.fi

One issue is that /etc/resolv.conf sometimes has a nameserver defined from the build of the image. There is an open CentOS bug report about this: https://bugs.centos.org/view.php?id=16948
centos-logo-light.png

by Johan G (noreply@blogger.com) at February 17, 2020 05:48 AM

February 14, 2020

StackHPC Team Blog

SR-IOV Networking in Kayobe

A Brief Introduction to Single-Root I/O Virtualisation (SR-IOV)

In a virtualised environment, SR-IOV enables closer access to underlying hardware, trading greater performance for reduced operational flexibility.

This involves the creation of virtual functions (VFs), which are presented as a copy of the physical function (PF) of the hardware device. The VF is passed-through to a VM, resulting in bypass of the hypervisor operating system for network activity. The principles of SR-IOV are presented in slightly greater depth in a short Intel white paper, and the OpenStack fundamentals are described in the Neutron online documentation.
A VF can be bound to a given VLAN, or (on some hardware, such as recent Mellanox NICs) it can be bound to a given VXLAN VNI. The result is direct access to a physical NIC attached to a tenant or provider network.

Note that there is no support for security groups or similar richer network functionality as the VM is directly connected to the physical network infrastructure, which provides no interface for injecting firewall rules or other externally managed packet handling.
Mellanox also offer a more advanced capability, known as ASAP2, which builds on SR-IOV to also offload Open vSwitch (OVS) functions from the hypervisor. This is more complex and not in scope for this investigation.

Setup for SR-IOV

Aside from OpenStack, deployment of SR-IOV involves configuration at many levels.

  • BIOS needs to be configured to enable both Virtualization Technology and SR-IOV.

  • Mellanox NIC firmware must be configured to enable the creation of SR-IOV VFs and define the maximum number of VFs to support. This requires the installation of the Mellanox Firmware Tools (MFT) package from Mellanox OFED.

  • Kernel boot parameters are required to support direct access to SR-IOV hardware:

    intel_iommu=on iommu=pt
    
  • A number of VFs can be created by writing the required number to a file under /sys, for example: /sys/class/net/eno6/device/sriov_numvfs

    NOTE: There are certain NIC models (e.g. Mellanox Connect-X 3) that do not support management via sysfs, those need to be configured using modprobe (see modprobe.d man page).

  • This is typically done as a udev trigger script on insertion of the PF device. The upper limit set for VFs is given by another (read-only) file in the same directory.

As a framework for management using infrastructure-as-code principles and Ansible at every level, Kayobe provides support for running custom Ansible playbooks on the inventory and groups of the infrastructure deployment. Over time StackHPC has developed a number of roles to perform additional configuration as a custom site playbook. A recent addition is a Galaxy role for SR-IOV setup

A simple custom site playbook could look like this:

---
- name: Configure SR-IOV
  hosts: compute_sriov
  tasks:
    - include_role:
        name: stackhpc.sriov
  handlers:
    - name: reboot
      include_tasks: tasks/reboot.yml
      tags: reboot
...

This playbook would then be invoked from the Kayobe CLI:

(kayobe) $ kayobe playbook run sriov.yml

Once the system is prepared for supporting SR-IOV, OpenStack configuration is required to enable VF resource management, scheduling according to VF availability, and pass-through of the VF to VMs that request it.

SR-IOV and LAGs

An additional complication might be that hypervisors use bonded NICs to provide network access for VMs. This provides greater fault tolerance. However, a VF is normally associated with only one PF (and the two PFs in a bond would lead to inconsistent connectivity).

Mellanox NICs have a feature, VF-LAG, which claims to enable SR-IOV to work in configurations where the ports of a 2-port NIC are bonded together.

Setup for VF-LAG requires additional steps and complexities, and we'll be covering it in greater detail in another blog post soon.

Nova Configuration

Scheduling with Hardware Resource Awareness

SR-IOV VFs are managed in the same way as PCI-passthrough hardware (eg, GPUs). Each VF is managed as a hardware resource. The Nova scheduler must be configured not to schedule instances requesting SR-IOV resources to hypervisors with none available. This is done using the PciPassthroughFilter scheduler filter.

In Kayobe config, the Nova scheduler filters are configured by defining non-default parameters in nova.conf. In the kayobe-config repo, add this to etc/kayobe/kolla/config/nova.conf:

[filter_scheduler]
available_filters = nova.scheduler.filters.all_filters
enabled_filters = other-filters,PciPassthroughFilter

(The other filters listed may vary according to other configuration applied to the system).

Hypervisor Hardware Resources for Passthrough

The nova-compute service on each hypervisor requires configuration to define which hardware/VF resources are to be made available for passthrough to VMs. In addition, for infrastructure with multiple physical networks, an association must be made to define which VFs connect to which physical network. This is done by defining a whitelist (pci_passthrough_whitelist) of available hardware resources on the compute hypervisors. This can be tricky to configure if the available resources are different in an environment with multiple variants of hypervisor hardware specification. One solution using Kayobe's inventory is to define whitelist hardware mappings either globally, or in group variables or even individual host variables as follows:

# Physnet to device mappings for SR-IOV, used for the pci
# passthrough whitelist and sriov-agent configs
sriov_physnet_mappings:
  p4p1: physnet2

This state can then be applied by adding a macro-expanded term to etc/kayobe/kolla/config/nova.conf:

{% raw %}
[pci]
passthrough_whitelist = [{% for dev, physnet in sriov_physnet_mappings.items() %}{{ (loop.index0 > 0)|ternary(',','') }}{ "devname": "{{ dev }}", "physical_network": "{{ physnet }}" }{% endfor %}]
{% endraw %}

We have used the network device name in for designation here, but other options are available:

  • devname: network-device-name
    (as used above)
  • address: pci-bus-address
    Takes the form [[[[<domain>]:]<bus>]:][<slot>][.[<function>]].
    This is a good way of unambiguously selecting a single device in the hardware device tree.
  • address: mac-address
    Can be wild-carded.
    Useful if the vendor of the SR-IOV NIC is different from all other NICs in the configuration, so that selection can be made by OUI.
  • vendor_id: pci-vendor product_id: pci-device
    A good option for selecting a single hardware device model, wherever they are located.
    These values are 4-digit hexadecimal (but the conventional 0x prefix is not required).

The vendor ID and device ID are available from lspci -nn (or lspci -x for the hard core). The IDs supplied should be those of the physical function (PF) not the virtual functions, which may be slightly different.

Neutron Configuration

Kolla-Ansible documents SR-IOV configuration well here: https://docs.openstack.org/kolla-ansible/latest/reference/networking/sriov.html.
See https://docs.openstack.org/neutron/train/admin/config-sriov.html for full details from Neutron's documentation.
For Kayobe configuration, we set a global flag kolla_enable_neutron_sriov in etc/kayobe/kolla.yml:
kolla_enable_neutron_sriov: true

Neutron Server

SR-IOV usually connects to VLANs; here we assume Neutron has already been configured to support this. The sriovnicswitch ML2 mechanism driver must be enabled. In Kayobe config, this is added to etc/kayobe/neutron.yml:

# List of Neutron ML2 mechanism drivers to use. If unset the kolla-ansible
# defaults will be used.
kolla_neutron_ml2_mechanism_drivers:
  - openvswitch
  - l2population
  - sriovnicswitch

Neutron SR-IOV NIC Agent

Neutron requires an additional agent to run on compute hypervisors with SR-IOV resources. The SR-IOV agent must be configured with mappings between physical network name and the interface name of the SR-IOV PF. In Kayobe config, this should be added in a file etc/kayobe/kolla/config/neutron/sriov_agent.ini. Again we can do an expansion using the variables drawn from Kayobe config's inventory and extra variables:

{% raw %}
[sriov_nic]
physical_device_mappings = {% for dev, physnet in sriov_physnet_mappings.items() %}{{ (loop.index0 > 0)|ternary(',','') }}{{ physnet }}:{{ dev }}{% endfor %}
exclude_devices =
{% endraw %}

by Michal Nasiadka at February 14, 2020 10:00 AM

February 12, 2020

OpenStack Superuser

OpenStack Case Study: CloudVPS

CloudVPS is one of the largest Dutch independent OpenStack providers that delivers advanced cloud solutions. With a team of 15 people, CloudVPS is one of the first in Europe to get started with OpenStack, and they are leading in the development of the scalable open-source platform. 

At the Open Infrastructure Shanghai Summit in November 2019, Superuser got a chance to talk with the OpenStack engineers from the CloudVPS on why they chose to OpenStack for their organization and how they use OpenStack.

What are some of the open source projects you are using?

Currently, we are using OpenStack, Oxwall, Salt, Tungsten Fabric, Gitlab and a few more. We have not yet started to use the open source projects that are hosted by the OpenStack Foundation, but we are planning on it. 

Why do you choose to use OpenStack?

We have used OpenStack for a long time. At the very beginning, we added Hyper V hypervisors for Windows VMs before we built our own orchestration layer. After about three to four years when OpenStack came out, we started our first OpenStack platform to do public cloud. The main reason that we start to use OpenStack is the high growth potential that we see in OpenStack. OpenStack’s features and its community size are big parts of the reason as well. In addition, OpenStack’s stability and maturity are particularly important to us right now. Upgradability is also a key factor for our team. In terms of our partnership with Mirantis, upgradability is the biggest reason why we chose to partner with them instead of doing it ourselves. 

What workloads are you running on OpenStack?

We don’t know the exact workloads, but basically all of it. What we do know is that we see web services on there and also platforms for large newspapers in the Netherlands, Belgium, Germany, and other countries around the world. It really varies, and we have all kinds of workloads. For the newspapers, we have conversion workloads for images. We also have an office automation environment like the Windows machine. There are some customers who run containers on top of it. Overall, there are definitely more workloads, but we don’t know all of it.

How large is your OpenStack deployment?

We have two deployments. In total, we have about over 10,000 instances on it and 400-500 nodes.

Stay informed:

Interested in information about the OpenStack Foundation and its projects? Stay up to date on OpenStack and the Open Infrastructure community today!

The post OpenStack Case Study: CloudVPS appeared first on Superuser.

by Sunny Cai at February 12, 2020 06:11 PM

Stephen Finucane

VCPUs, PCPUs and Placement

In a previous blog post, I’d described how instance NUMA topologies and CPU pinning worked in the OpenStack Compute service (nova). Starting with the 20.

February 12, 2020 12:00 AM

February 10, 2020

SWITCH Cloud Blog

RadosGW/Keystone Integration Performance Issues—Finally Solved?

For several years we have been running OpenStack and Ceph clusters as part of SWITCHengines, an IaaS offering for the Swiss academic community. Initially, our main “job” for Ceph was to provide scalable block storage for OpenStack VMs—which it does quite well. But we also provided S3 (and Swift, but that’s outside the scope of this post) -based object storage via RadosGW from early on. This easy-to-use object storage turned out to be popular far beyond our initial expectations.

One valuable feature of RadosGW is that it integrates with Keystone, the Authentication and Authorization service in OpenStack. This meant that any user of our OpenStack offering can create, within her Project/tenant, EC2-compatible credentials to set up, and manage access to, S3 object store buckets. And they sure did! SWITCHengines users started to use our object store to store videos (and stream them directly from our object store to users’ browsers), research data for archival and dissemination, external copies from (parts of) their enterprise backup systems, and presumably many other interesting things; a “defining characteristic” of the cloud is that you don’t have to ask for permission (see “On-demand self-service” in the NIST Cloud definition)—though as a community cloud provider, we are happy to hear about, and help with, specific use cases.

Now this sounds pretty close to cloud nirvana, but… there was a problem: Each time a client made an authenticated (signed) S3 request on any bucket, RadosGW had to outsource the validation of the request signature to Keystone, which would return either the identity of the authenticated user (that RadosGW could then use for authorization purposes), or a negative reply in case the signature doesn’t validate. Unfortunately, this outsourced signature validation process turns out to bring significant per-request overhead. In fact, for “easy” requests such as reading and writing small objects, this authentication overhead easily dominates total processing time. For a sense of the magnitude, small requests without Keystone validation often take <10ms to complete (according to the logs of our NGinx-based HTTPS server that acts as a front end to the RadosGW nodes. Whereas any request involving Keystone takes at least 600ms.

One undesirable effect is that our users probably wonder why simple requests have such a high baseline response time. Transfers of large objects don’t care much, because at some point the processing time is dominated by Rados/network transfer time of user data.

But an even worse effect is that S3 users could, by using client software that “aggressively” exploited parallelism, put very high load on our Keystone service, to the point that OpenStack operations sometimes ran into timeouts when they needed to use the authentication/authorization service.

In our struggle to cope with this reoccurring issue, we found a somewhat ugly workaround: When we found a EC2 credential in Keystone whose use in S3/RadosGW contributed significant load, we extracted that credential (basically an ID/secret pair) from Keystone, and provisioned it locally in all of our RadosGW instances. This always solved the individual performance problem for that client, response times dropped by 600ms immediately, and load on our Keystone system subsided.

While the workaround fixed our immediate troubles, it was deeply unsatisfying in several ways:

  • Need to identify “problematic” S3 uses that caused high Keystone load
  • Need to (more or less manually) re-provision Keystone credentials in RadosGW
  • Risk of “credential drift” in case the Keystone credentials changed (or disappeared) after their re-provisioning in RadosGW—the result would be that clients would still be able to access resources that they shouldn’t (anymore).

But the situation was bearable for us, and we basically resigned to having to fix performance emergencies every once in a while until maybe one day, someone would write a Python script or something that would synchronize EC2 credentials between Keystone and RadosGW…

PR #26095: A New Hope

But then out of the blue, James Weaver from the BBC contributed PR #26095, rgw: Added caching for S3 credentials retrieved from keystone. This changes the approach to signature validation when credentials are found in Keystone: The key material (including secret key) found in Keystone is cached by RadosGW, and RadosGW always performs signature validation locally.

James’s change was merged into master and will presumably come out with the “O” release of Ceph. We run Nautilus, and when we got wind of this change, we were excited to try it out. We had some discussions as to whether the patch might be backported to Nautilus; in the end we considered that unlikely at the current state, because the patch unconditionally changes the behavior in a way that could violate some security assumptions (e.g. that EC2 secrets would never leave Keystone).

We usually avoid carrying local patches, but in this case we were sufficiently motivated to go and cherry-pick the change on top of the version we were running (initially v14.2.5, later v14.2.6 and v14.2.7). We basically followed the instructions on how to build Ceph, but after cloning the Ceph repo, ran

git checkout v14.2.7
git cherry-pick affb7d396f76273e885cfdbcd363c1882496726c -m 1 -v
edit debian/changelog and prepend:

ceph (14.2.7-1bionic-switch1) stable; urgency=medium

  * Cherry-picked upstream pull #26095:

    rgw: Added caching for S3 credentials retrieved from keystone

 -- Simon Leinen <simon.leinen@switch.ch>  Thu, 01 Feb 2020 19:51:21 +0000

Then, dpkg-buildpackage and wait for a couple of hours…

First Results

We tested the resulting RadosGW package in our staging environment for a couple of days before trying them in our production clusters.

When we activated the patched RadosGW in production, the effects were immediately visible: The CPU load of our Keystone system went down by orders of magnitude.

Screenshot 2020-02-02 at 10.29.54

On 2020-01-27 at around 08:00, we upgraded our first production cluster’s RadosGWs. Twenty-four hours later, we upgraded the RadosGWs on the second cluster. The baseline load on our Keystone service dropped visibly on the first upgrade, but some high load peaks could still be seen. Since the second region was upgraded, no sharp peaks anymore. There is a periodic load increase every night between 03:10 and 04:10, presumably due to some charging/accounting system doing its thing. Probably these peaks were “always” there, but they only became apparent once we started deploying the credential-caching code.

The 95th-percentile latency of “small” requests (defined as both $body_bytes_sent and $request_length being lower than 65536 was reduced from ~750ms to ~100ms:

95%

 

Conclusion and Outlook

We owe the BBC a beer.

To make the patch perfect, maybe it would be cool to limit the lifetime of cached credentials to some reasonable value such as a few hours. This could limit the damage in case credentials should be invalidated. Though I guess you could just restart all RadosGW processes and lose any cached credentials immediately.

If you are interested in using our RadosGW packages made from cherry-picking PR #20965 on top of Nautilus, please contact us. Note that we only have x86_64 packages for Ubuntu 18.04 “Bionic” GNU/Linux.

by Simon Leinen at February 10, 2020 08:09 AM

February 05, 2020

Mirantis

Get Your Windows Apps Ready for Kubernetes

Historically, the Kubernetes orchestrator has been focused on Linux-based workloads, but Windows has started to play a larger part in the ecosystem.

by Steven Follis at February 05, 2020 04:45 PM

RDO

Migration Paths for RDO From CentOS 7 to 8

In last CentOS Dojo, it was asked if RDO would provide python3 packages for OpenStack Ussuri on CentOS7 and if it would be “possible” in the context of helping in the upgrade path from Train to Ussuri. As “possible” is a vague term and I think the response deserves some more explanation than a binary one, I’ve collected my thoughts in this topic as a way to start a discussion within the RDO community.

Yes, upgrades are hard

We all know that upgrading production OpenStack cloud is complex and depends strongly on each specific layout and deployment tools (different deployment tools may support or not the OpenStack upgrades) and processes. In addition, upgrading from CentOS 7 to 8 requires OS redeploy, which introduces operational complexity to the migration. We are commited to help the RDO community users to migrate their clouds to new versions of OpenStack and/or Operating Systems in different ways:
  • Providing RDO Train packages on CentOS8. This allows users to choose between doing a one-step upgrade from CentOS7/Train -> CentOS8/Ussuri or split it in two steps CentOS7/Train -> CentOS8/Train -> CentOS8/Ussuri.
  • RDO maintains OpenStack packages during the whole upstream maintenance cycle for the Train release, this is until April 2021. Operators can take some time to plan and execute their migration paths.
Also the Rolling Upgrades features provided in OpenStack allows one to keep agents running in compute nodes in Train temporarily after the controllers have been updated to Ussuri using Upgrade Levels in Nova or built-in backwards compatibility features in Neutron and other services.

What “Supporting a OpenStack release in a CentOS version” means in RDO

Before discussing the limitations and challenges to support RDO Ussuri on CentOS 7.7 using python 3, I’ll describe what supporting a new RDO release means:

Build

  • Before we can start building OpenStack packages we need to have all required dependencies used to build or run OpenStack services. We use the libraries from CentOS base repos as much as we can and avoid rebasing or forking CentOS base packages unless it’s strongly justified.
  • OpenStack packages are built using DLRN in RDO Trunk repos or CBS using jobs running in post pipeline in review.rdoproject.org.
  • RDO also consumes packages from other CentOS SIGs as Ceph from Storage SIG, KVM from Virtualization or collectd from OpsTools.

Validate

  • We run CI jobs periodically to validate the packages provided in the repos. These jobs are executed using the Zuul instance in SoftwareFactory project or Jenkins in CentOS CI infra and deploy different configurations of OpenStack using Packstack, puppet-openstack-integration and TripleO.
  • Also, some upstream projects include CI jobs on CentOS using the RDO packages to gate every change on it.

Publish

  • RDO Trunk packages are published in https://trunk.rdoproject.org and validated repositories are moved to promoted links.
  • RDO CloudSIG packages are published in official CentOS mirrors after they are validated by CI jobs.

Challenges to provide python 3 packages for RDO Ussuri in CentOS 7

Build

  • While CentOS 7 includes a quite wide set of python 2 modules (150+) in addition to the interpreter, the python 3 stack included in CentOS 7.7 is just the python interpreter and ~5 python modules. All the missing ones would need to be bootstraped for python3.
  • Some python bindings are provided as part of other builds, i.e. python-rbd or python-rados is part of Ceph in StorageSIG, python-libguestfs is part of libguestfs in base repo, etc… RDO doesn’t own those packages so commitment from the owners would be needed or RDO would need to take ownership of them in this specific release (which means maintaining them until Train EOL).
  • Current specs in Ussuri tie python version to CentOS version. We’d need to figure out a way to switch python version in CentOS 7 via tooling configuration and macros.

Validate

  • In order to validate the python3 builds for Ussuri on CentOS 7, the deployment tools (puppet-openstack, packstack, kolla and TripleO) would need upstream fixes to install python3 packages instead of python2 for CentOS 7. Ideally, new CI jobs should be added with this configuration to gate changes in those repositores. This would require support from the upstream communities.

Conclusion

  • Alternatives exist to help operators in the migration path from Train on CentOS 7 to Ussuri on CentOS 8 and avoid a massive full cloud reboot.
  • Doing a full supported RDO release of Ussuri on CentOS 7 would require a big effort in RDO and other projects that can’t be done with existing resources:
    • It would required a full bootstrap of python3 dependencies which are pulled from CentOS base repositoris in python 2.
    • Other SIGs would need to provide python3 packages or, alternatively, RDO would need to maintain them for this specific release.
    • In order to validate the release upstream deployment projects would need to support this new python3 Train release.
  • There may be chances for intermediate solutions limited to a reduced set of packages that would help in the transition period. We’d need to hear details from the interested community members about what would be actually needed and what’s the desired migration workflow. We will be happy to onboard new community members with interest in contributing to this effort.
We are open to listen and discuss what other options may help the users, come to us and let us know how we can do it.

by amoralej at February 05, 2020 02:04 PM

February 04, 2020

OpenStack Superuser

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

Spotlight on 2019 OpenStack Foundation Annual Report

The OSF community had a productive year, merging 58,000 code changes to produce open source infrastructure software like Airship, Kata Containers, StarlingX, and Zuul, along with the third most active OSS project in the world, OpenStack. With 100,000 members and millions more visiting OSF websites in 2019 to get involved, the community made huge strides in addressing the $7.7 billion market for OpenStack and more than $12 billion combined OpenStack & containers markets in the future.

Each individual member, working group, SIG, and contributor was instrumental in continuing to support the OSF mission: helping people build and operate open infrastructure. The OSF 2019 Annual Report was published today highlighting the achievements across the community and the goals for the year ahead.

Let’s break down some of the highlights of last year:

  • The OSF confirmed three new open infrastructure projects to complement OpenStack in powering the world’s open infrastructure;
  • OpenStack is one of the top three most active open source projects in number of changes, and is projected to be a $7.7 billion USD market by 2023;
  • Some of the world’s largest brands—AT&T, Baidu, Blizzard Entertainment, BMW, China UnionPay, Walmart, and Volvo among others—shared their open source infrastructure use cases and learnings;
  • Upstream contributors continued to prioritize cross-project integration with open source projects including Ceph, Kubernetes, Ansible, and Tungsten Fabric.
  • New contributors were on-boarded through multiple internship and mentoring programs as well as OpenStack Upstream Institute, which was held in seven countries last year!

The OSF would like to extend a huge thanks to the global community for all of the work that went into 2019 and is continuing in 2020 to help people build and operate open source infrastructure. Check out the full OSF 2019 Annual Report on the OpenStack website!

OpenStack Foundation (OSF)

  • The results for the 2020 election of Individual Directors are in! Congratulations to all the elected 2020 OpenStack Foundation Board of Directors! Check out the results.
  • New Event! OpenDev+PTG
    • June 8-11 in Vancouver, BC
    • OpenDev + PTG is a collaborative event organized by the OpenStack Foundation gathering developers, system architects, and operators to address common open source infrastructure challenges.
    • Registration is now open!
    • Programming Committee information is available now. Sponsorship information will be coming soon.
  • The next Open Infrastructure Summit will be held this fall on October 19-23 in bcc Berlin Congress Center, Germany. Registration and sponsorships will be available soon, stay tuned for details!

Airship: Elevate your infrastructure

  • Airship Blog Series 5 – Drydock and Its Relationship to Cluster API – As part of the evolution of Airship 1.0, an enduring goal has remained supporting multiple provisioning backends beyond just bare metal. This includes those that can provision to third-party clouds and to other use cases like OpenStack VMs as well as enable you to bring your own infrastructure. Read how Drydock is used to accomplish this.
  • Check out the Airship YouTube playlist and see the Airship content that you might have missed at the Shanghai Summit.
  • Interested in getting involved? Check out this page.

Kata Containers: The speed of containers, the security of VMs

  • Kata Containers 1.9.4 and 1.10.0 releases are available now! The 1.10.0 release highlights on initial support for Cloud Hypervisor, HybridVsock support for cloud hypervisor and firecracker, updated Firecracker version to v0.19.1 and better rootless support for firecracker. This release also deprecates bridged networking model.
  • 2019 was a breakthrough year with production deployments and many milestones of Kata Containers. Check out what the community had accomplished in the past year and Kata Containers’ project update in the 2019 OpenStack Foundation Annual Report!
  • Looking for the 2020 Architecture Committee meeting agenda? See this meeting notes etherpad.

OpenStack: Open source software for creating private and public clouds

  • The community goals for the Ussuri development cycle have been finalized: dropping Python 2.7 support (championed by Ghanshyam Mann), and project-specific PTL and contributor documentation(championed by Kendall Nelson). Those should be completed by the Ussuri release, which is scheduled to happen on May 13, 2020.
  • The name for the release after Ussuri has been selected. It will be called Victoria, after the capital of British Columbia, where our next PTG will happen.
  • Special Interest Groups regroup users and developers interested in supporting a specific use case for OpenStack. Two new SIGs were recently formed. The Large Scale SIG wants to push back scaling limits within a given cluster and document better configuration defaults for large scale deployments. The Multi-arch SIG wants to better support OpenStack on CPU architectures other than x86_64. If you’re interested in those topics, please join those SIGs!
  • Interested in getting involved in the OpenStack community, but you don’t know where to start? Want to jump into a project, but you don’t know anyone? The First Contact SIG can help! For more information, you can check out their wiki page. They have regular biweekly meetings and hang out in the #openstack-dev and #openstack-upstream-institute IRC channels ready to answer your questions!

StarlingX: A fully featured cloud for the distributed edge

  • StarlingX 3.0 is now available! It integrates the Train version of OpenStack, adds improvements to the areas of container and hardware acceleration support, and delivers a new functionality called Distributed Cloud architecture. Check out the release notes for further details or download the ISO image and start playing with the software!
  • The community has been focusing on increasing test coverage and running a remote hackathon. Check their etherpad for more details and keep an eye out for updates on the starlingx-discuss mailing list.
  • The next StarlingX Community Meetup is taking place on March 3-4 in Chandler, Arizona. If you would like to attend on site please register on the planning etherpad as soon as you can!

Zuul: Stop merging broken code

  • The Zuul Project Lead position has been renewed, and the maintainers have chosen James Blair to lead them through the 2020 term.
  • A significant overhaul of Zuul’s service documentation is underway, with the goal of making it easier for users to find the information they need.
  • December and January saw four minor releases of Zuul (3.12.0-3.15.0) and two for Nodepool (3.10.0 and 3.11.0). Among a slew of other improvements, these switched the default Ansible version from 2.7 to 2.8, added support for the latest version (2.9), deprecated the most recent EOL version (2.6) and removed support for its predecessor (2.5). This follows a more consistent Ansible support lifecycle plan, which is in the process of being formalized.

Find the OSF at these upcoming Open Infrastructure community events

March/April

May

  • May 4: OpenStack Day DR Congo

June

July

  • July 15: Cloud Operator Day Japan

October

  • October 19-23: Open Infrastructure Summit Berlin

For more information about these events, please contact denise@openstack.org

Questions / feedback / contribute

This newsletter is edited by the OpenStack Foundation staff to highlight open infrastructure communities. We want to hear from you! If you have any feedback, news or stories that you want to share, reach us through community@openstack.org.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by Sunny Cai at February 04, 2020 04:29 PM

February 01, 2020

Stephen Finucane

Will Someone *Please* Tell Me Whats Going On? (Redux)

This was a talk I gave at FOSDEM 2020. I had previously given this talk at PyCon Limerick. The summary is repeated below. Software rarely stands still (unless it’s TeX).

February 01, 2020 12:00 AM

January 30, 2020

Galera Cluster by Codership

Improved Cloud (WAN) performance with Galera Cluster MySQL 5.6.47 and 5.7.29

Codership is pleased to announce a new Generally Available (GA) release of Galera Cluster for MySQL 5.6 and 5.7, consisting of MySQL-wsrep 5.6.47 (release notes, download) and MySQL-wsrep 5.7.29 (release notes, download) with a new Galera Replication library 3.29 (release notes, download), implementing wsrep API version 25. This release incorporates all changes to MySQL 5.6.47 and 5.7.29 respectively.

A highlight for improved WAN performance (great for cloud environments, well-tuned networks will perform faster) in Galera Replication library 3.29 is there is a new setting for socket.recv_buf_size set to auto which allows the underlying kernel (be it on Linux or FreeBSD) to tune the TCP receive buffer. A new socket.send_buf_size exists now, defaulting to auto but allows tuning of the send buffer.

The Galera Replication library compared to the previous 3.29 release has a few new enhancements: when the Group Communication System (GCS) library would get JOIN messages even if the node was in a DONOR state has been fixed. GCache could occasionally have different histories from different clusters has been fixed. Reliability of time-stamping and liveness checking of connections between cluster nodes was improved to eliminate false positives especially during the replication of large transactions, with segmentation configured; delivery latency of relayed messages between segments was improved by introducing fair queuing of the outbound message. The communications protocol around the GComm EVS (Extended Virtual Synchrony) layer now uses point-to-point messaging to deliver missing messages and there is rate limiting on retransmission requests. Quorum computation was hardened to maintain consistency during rapid configuration changes.

MySQL 5.6.47 with Galera Replication library 3.29 is an updated rebase which can be considered a maintenance release.

MySQL 5.7.29 with Galera Replication library 3.29 is an updated rebase, which can mostly be considered a maintenance release. It does fix a notable issue since the last release, in where there was an occasional hang during server shutdown due to an occasional race condition received from a client connection stuck waiting for a network event.

You can get the latest release of Galera Cluster from http://www.galeracluster.com. There are package repositories for Debian, Ubuntu, CentOS, RHEL, OpenSUSE and SLES. The latest versions are also available via the FreeBSD Ports Collection.

by Colin Charles at January 30, 2020 03:34 PM

January 29, 2020

Nate Johnston

Home Lab Part 1

Working as a full time upstream developer in OpenStack means that sometimes I need to run a full cloud environment in order to determine if a new feature works, or debug some kind of issue that someone has reported. This requires more capability than a standard laptop can provide: I need a machine that is a very basic enterprise-class server. And it needs to be able to have plenty of memory.

January 29, 2020 09:53 PM

January 22, 2020

RDO

Community Blog Round Up 20 January 2020

We’re super chuffed to see another THREE posts from our illustrious community – Adam Young talks about api port failure and speed bumps while Lars explores literate programming.

Shift on Stack: api_port failure by Adam Young

I finally got a right-sized flavor for an OpenShift deployment: 25 GB Disk, 4 VCPU, 16 GB Ram. With that, I tore down the old cluster and tried to redeploy. Right now, the deploy is failing at the stage of the controller nodes querying the API port. What is going on?

Read more at https://adam.younglogic.com/2020/01/shift-on-stack-api_port-failure/

Self Service Speedbumps by Adam Young

The OpenShift installer is fairly specific in what it requires, and will not install into a virtual machine that does not have sufficient resources. These limits are 16 GB RAM, 4 Virtual CPUs, and 25 GB Disk Space. This is fairly frustrating if your cloud provider does not give you a flavor that matches this. The last item specifically is an artificial limitation as you can always create an additional disk and mount it, but the installer does not know to do that.

Read more at https://adam.younglogic.com/2020/01/self-service-speedbumps/

Snarl: A tool for literate blogging by Lars Kellogg-Stedman

Literate programming is a programming paradigm introduced by Donald Knuth in which a program is combined with its documentation to form a single document. Tools are then used to extract the documentation for viewing or typesetting or to extract the program code so it can be compiled and/or run. While I have never been very enthusiastic about literate programming as a development methodology, I was recently inspired to explore these ideas as they relate to the sort of technical writing I do for this blog.

Read more at https://blog.oddbit.com/post/2020-01-15-snarl-a-tool-for-literate-blog/

by Rain Leander at January 22, 2020 09:09 PM

January 20, 2020

Emilien Macchi

Developer workflow with TripleO

In this post we’ll see how one can use TripleO for developing & testing changes into OpenStack Python-based projects (e.g. Keystone).

 

Even if Devstack remains a popular tool, it is not the only one you can use for your development workflow.

TripleO hasn’t only been built for real-world deployments but also for developers working on OpenStack related projects like Keystone for example.

Let’s say, my Keystone directory where I’m writing code is in /home/emilien/git/openstack/keystone.

Now I want to deploy TripleO with that change and my code in Keystone. For that I will need a server (can be a VM) with at least 8GB of RAM, 4 vCPU and 80GB of disk, 2 NICs and CentOS7 or Fedora28 installed.

Prepare the repositories and install python-tripleoclient:

If you’re deploying on recent Fedora or RHEL8, you’ll need to install python3-tripleoclient.

Now, let’s prepare your environment and deploy TripleO:

Note: change the YAML for your own needs if needed. If you need more help on how to configure Standalone, please check out the official manual.

Now let’s say your code needs a change and you need to retest it. Once you modified your code, just run:

Now, if you need to test a review that is already pushed in Gerrit and you want to run a fresh deployment with it, you can do it with:

I hope these tips helped you to understand how you can develop and test any OpenStack Python-based project without pain, and pretty quickly. On my environment, the whole deployment takes less than 20 minutes.

Please give any feedback in comment or via email!

by Emilien at January 20, 2020 02:32 PM

StackHPC Team Blog

StackHPC Winter Design Summit

Our team is becoming increasingly international, and while we work well as a virtual organisation, sometimes there is no substitute for gathering in a room and charting the course of the company for the ensuing months.

We have been holding design summits for the last few years, with the purpose of reviewing new technologies, considering improvements to our team processes and updating everyone on the growth and financial position of the firm.

In addition, we issue employee stock options to broaden the team's stake in the company's success.

With our growing team, and growing customer base, we spent a good deal of time discussing how we can continue to work as effectively as we do while the company grows and takes on new commitments to deliver. The agility of our working practices has been our strength, and we intend to keep it that way.

This is an exciting time to be working on the creation of high-performance cloud infrastructure, and our discussions reflected the pace of innovation occurring on many fronts. Watch this space for 2020!

Map of StackHPC's design summit

Get in touch

If you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.

by Stig Telfer at January 20, 2020 09:00 AM

January 19, 2020

Adam Young

Shift on Stack: api_port failure

I finally got a right-sized flavor for an OpenShift deployment: 25 GB Disk, 4 VCPU, 16 GB Ram. With that, I tore down the old cluster and tried to redeploy. Right now, the deploy is failing at the stage of the controller nodes querying the API port. What is going on?

Here is the reported error on the console:

The IP address of 10.0.0.5 is attached to the following port:

$ openstack port list | grep "0.0.5"
| da4e74b5-7ab0-4961-a09f-8d3492c441d4 | demo-2tlt4-api-port       | fa:16:3e:b6:ed:f8 | ip_address='10.0.0.5', subnet_id='50a5dc8e-bc79-421b-aa53-31ddcb5cf694'      | DOWN   |

That final “DOWN” is the port state. It is also showing as detached. It is on the internal network:

Looking at the installer code, the one place I can find a reference to the api_port is in the template data/data/openstack/topology/private-network.tf used to build the value openstack_networking_port_v2. This value is used quite heavily in the rest of the installers’ Go code.

Looking in the terraform data built by the installer, I can find references to both the api_port and openstack_networking_port_v2. Specifically, there are several object of type openstack_networking_port_v2 with the names:

$ cat moc/terraform.tfstate  | jq -jr '.resources[] | select( .type == "openstack_networking_port_v2") | .name, ", ", .module, "\n" '
api_port, module.topology
bootstrap_port, module.bootstrap
ingress_port, module.topology
masters, module.topology

On a baremetal install, we need an explicit A record for api-int.<cluster_name>.<base_domain>. That requirement does not exist for OpenStack, however, and I did not have one the last time I installed.

api-int is the internal access to the API server. Since the controllers are hanging trying to talk to it, I assume that we are still at the stage where we are building the control plane, and that it should be pointing at the bootstrap server. However, since the port above is detached, traffic cannot get there. There are a few hypotheses in my head right now:

  1. The port should be attached to the bootstrap device
  2. The port should be attached to a load balancer
  3. The port should be attached to something that is acting like a load balancer.

I’m leaning toward 3 right now.

The install-config.yaml has the line:
octaviaSupport: “1”

But I don’t think any Octavia resources are being used.

$ openstack loadbalancer pool list

$ openstack loadbalancer list

$ openstack loadbalancer flavor list
Not Found (HTTP 404) (Request-ID: req-fcf2709a-c792-42f7-b711-826e8bfa1b11)

by Adam Young at January 19, 2020 12:55 AM

January 17, 2020

Mirantis

Best of 2019 Blogs, Part 4: Announcing Docker Enterprise 3.0 General Availability

Today, we’re excited to announce the general availability of Docker Enterprise 3.0 – the only desktop-to-cloud enterprise container platform enabling organizations to build and share any application and securely run them anywhere - from hybrid cloud to the edge.

by Docker Enterprise Team at January 17, 2020 04:00 PM

January 16, 2020

Fleio Blog

Fleio 2020.01: Operations user interface, improvements to the ticketing system, bug fixes and more

Fleio version 2019.12 is now available! The latest version was published today, 2019-01-16. Operations user interface With the 2020.01 release we have added a new user interface feature: Operations We decided to add this feature in order to improve the way that some tasks were being done and to make all the preparations to move […]

by Marian Chelmus at January 16, 2020 01:02 PM

January 15, 2020

Adam Young

Self Service Speedbumps

The OpenShift installer is fairly specific in what it requires, and will not install into a virtual machine that does not have sufficient resources. These limits are:

  • 16 GB RAM
  • 4 Virtual CPUs
  • 25 GB Disk Space

This is fairly frustrating if your cloud provider does not give you a flavor that matches this. The last item specifically is an artificial limitation as you can always create an additional disk and mount it, but the installer does not know to do that.

In my case, there is a flavor that almost matches; it has 10 GB of Disk space instead of the required 25. But I cannot use it.

Instead, I have to use a larger flavor that has double the VCPUs, and thus eats up more of my VCPU quota….to the point that I cannot afford more than 4 Virtual machines of this size, and thus cannot create more than one compute node; OpenShift needs 3 nodes for the control plane.

I do not have permissions to create a flavor on this cloud. Thus, my only option is to open a ticket. Which has to be reviewed and acted upon by an administrator. Not a huge deal.

This is how self service breaks down. A non-security decision (link disk size with the other characteristics of a flavor) plus Access Control rules that prevent end users from customizing. So the end user waits for a human to respond

In my case, that means that I have to provide an alternative place to host my demonstration, just in case things don’t happen in time. Which costs my organization money.

This is not a ding on my cloud provider. They have the same OpenStack API as anyone else deploying OpenStack.

This is not a ding on Keystone; create flavor is not a project scoped operation, so I can’t even blame my favorite bug.

This is not a ding on the Nova API. It is reasonable to reserve the ability to create Flavors to system administrators. If instances have storage attached, to provide it in reasonable sized chunks.

My problem just falls at the junction of several different zones of responsibility. It is the overlap that causes the pain in this case. This is not unusual

Would it be possible to have a more granular API, like “create customer flavor” that built a flavor out of pre-canned parts and sizes? Probably. That would solve my problem. I don’t know if this is a general problem, though.

This does seem like it is something that could be addressed by a GitOps type approach. In order to perform an operation like this, I should be able to issue a command that gets checked in to git, confirmed, and posted for code review. An administrator could then confirm or provide an alternative approach. This happens in the ticketing system. It is human-resource-intensive. If no one says “yes” the default is no…and thing just sits there.

What would be a better long term solution? I don’t know. I’m going to let this idea set for a while.

What do you think?

by Adam Young at January 15, 2020 05:18 PM

January 10, 2020

Dan Prince

Keystone Operator Deploy/Upgrade on OpenShift

Keystone deploy and upgrade with an OpenShift/Kubernetes Operator

by Dan Prince at January 10, 2020 09:05 PM

January 06, 2020

VEXXHOST Inc.

Why are OpenStack Upgrades so Difficult?

Upgrades remain one of those concepts that everyone seems to feel is one of the Achilles' heels of OpenStack. Learn why they actually aren't difficult and, instead, why you may be making them difficult.

The post Why are OpenStack Upgrades so Difficult? appeared first on VEXXHOST.

by Mohammed Naser at January 06, 2020 02:41 PM

RDO

Community Blog Round Up 06 January 2020

Welcome to the new DECADE! It was super awesome to run the blog script and see not one, not two, but THREE new articles by the amazing Adam Young who tinkered with Keystone, TripleO, and containers over the break. And while Lars only wrote one article, it’s the ultimate guide to the Open Virtual Network within OpenStack. Sit back, relax, and inhale four great articles from the RDO Community.

Running the TripleO Keystone Container in OpenShift by Adam Young

Now that I can run the TripleO version of Keystone via podman, I want to try running it in OpenShift.

Read more at https://adam.younglogic.com/2019/12/running-the-tripleo-keystone-container-in-openshift/

Official TripleO Keystone Images by Adam Young

My recent forays into running containerized Keystone images have been based on a Centos base image with RPMs installed on top of it. But TripleO does not run this way; it runs via containers. Some notes as I look into them.

Read more at https://adam.younglogic.com/2019/12/official-tripleo-keystone-images/

OVN and DHCP: A minimal example by Lars Kellogg-Stedman

Introduction A long time ago, I wrote an article all about OpenStack Neutron (which at that time was called Quantum). That served as an excellent reference for a number of years, but if you’ve deployed a recent version of OpenStack you may have noticed that the network architecture looks completely different. The network namespaces previously used to implement routers and dhcp servers are gone (along with iptables rules and other features), and have been replaced by OVN (“Open Virtual Network”).

Read more at https://blog.oddbit.com/post/2019-12-19-ovn-and-dhcp/

keystone-db-init in OpenShift by Adam Young

Before I can run Keystone in a container, I need to initialize the database. This is as true for running in Kubernetes as it was using podman. Here’s how I got keystone-db-init to work.

Read more at https://adam.younglogic.com/2019/12/keystone-db-init-in-openshift/

by Rain Leander at January 06, 2020 12:52 PM

December 30, 2019

Slawek Kaplonski

Analyzing number of failed builds per patch in OpenStack projects

Short background For the past few years I have been an OpenStack Neutron contributor, core reviewer and now even the project’s PTL. One of my responsibilities in the community is taking care of our CI system for Neutron. As part of this job I have to constantly check how various CI jobs are working and if the reasons for their failure are related to the patch or the CI itself.

December 30, 2019 10:56 PM

December 21, 2019

Adam Young

Running the TripleO Keystone Container in OpenShift

Now that I can run the TripleO version of Keystone via podman, I want to try running it in OpenShift.

Here is my first hack at a deployment yaml. Note that it looks really similar to the keystone-db-init I got to run the other day.

If I run it with:

oc create -f keystone-pod.yaml

I get a CrashLoopBackoff error, with the following from the logs:

$ oc logs pod/keystone-api 
 sudo -E kolla_set_configs
 sudo: unable to send audit message: Operation not permitted
 INFO:main:Loading config file at /var/lib/kolla/config_files/config.json
 ERROR:main:Unexpected error:
 Traceback (most recent call last):
 File "/usr/local/bin/kolla_set_configs", line 412, in main
 config = load_config()
 File "/usr/local/bin/kolla_set_configs", line 294, in load_config
 config = load_from_file()
 File "/usr/local/bin/kolla_set_configs", line 282, in load_from_file
 with open(config_file) as f:
 IOError: [Errno 2] No such file or directory: '/var/lib/kolla/config_files/config.json' 

I modified the config.json to remove steps that were messing me up. I think I can now remove evn that last config file, but I left it for now.

{
   "command": "/usr/sbin/httpd",
   "config_files": [
        {  
              "source": "/var/lib/kolla/config_files/src/*",
              "dest": "/",
              "merge": true,
              "preserve_properties": true
        }
    ],
    "permissions": [
	    {
            "path": "/var/log/kolla/keystone",
            "owner": "keystone:keystone",
            "recurse": true
        }
    ]
}

I need to add the additional files to a config map and mount those inside the container. For example, I can create a config map with the config.json file, a secret for the Fernet key, and a config map for the apache files.

oc create configmap keystone-files --from-file=config.json=./config.json
kubectl create secret generic keystone-fernet-key --from-file=../kolla/src/etc/keystone/fernet-keys/0
oc create configmap keystone-httpd-files --from-file=wsgi-keystone.conf=../kolla/src/etc/httpd/conf.d/wsgi-keystone.conf

Here is my final pod definition

apiVersion: v1
kind: Pod
metadata:
  name: keystone-api
  labels:
    app: myapp
spec:
  containers:
  - image: docker.io/tripleomaster/centos-binary-keystone:current-tripleo 
    imagePullPolicy: Always
    name: keystone
    env:
    - name: KOLLA_CONFIG_FILE
      value: "/var/lib/kolla/config_files/src/config.json"
    - name: KOLLA_CONFIG_STRATEGY
      value: "COPY_ONCE"
    volumeMounts:
    - name: keystone-conf
      mountPath: "/etc/keystone/"
    - name: httpd-config
      mountPath: "/etc/httpd/conf.d"
    - name: config-json
      mountPath: "/var/lib/kolla/config_files/src"

    - name: keystone-fernet-key
      mountPath: "/etc/keystone/fernet-keys/0"
  volumes:
  - name: keystone-conf
    secret:
      secretName: keystone-conf
      items:
      - key: keystone.conf
        path: keystone.conf
        mode: 511	
  - name: keystone-fernet-key
    secret:
      secretName: keystone-fernet-key
      items:
      - key: "0"
        path: "0"
        mode: 511	
  - name: config-json
    configMap:
       name: keystone-files
  - name: httpd-config
    configMap:
       name: keystone-httpd-files

And show that it works for basic stuff:

$ oc rsh keystone-api
sh-4.2# curl 10.131.1.98:5000
{"versions": {"values": [{"status": "stable", "updated": "2019-07-19T00:00:00Z", "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}], "id": "v3.13", "links": [{"href": "http://10.131.1.98:5000/v3/", "rel": "self"}]}]}}curl (HTTP://10.131.1.98:5000/): response: 300, time: 3.314, size: 266

Next steps: expose a route, make sure we can get a token.

by Adam Young at December 21, 2019 12:31 AM

December 19, 2019

Adam Young

Official TripleO Keystone Images

My recent forays into running containerized Keystone images have been based on a Centos base image with RPMs installed on top of it. But TripleO does not run this way; it runs via containers. Some notes as I look into them.

The official containers for TripleO are currently hosted on docker.com. The Keystone page is here:

Don’t expect the docker pull command posted on that page to work. I tried a comparable one with podman and got:

$ podman pull tripleomaster/centos-binary-keystone
Trying to pull docker.io/tripleomaster/centos-binary-keystone...
  manifest unknown: manifest unknown
Trying to pull registry.fedoraproject.org/tripleomaster/centos-binary-keystone...

And a few more lines of error output. Thanks to Emilien M, I was able to get the right command:

$ podman pull tripleomaster/centos-binary-keystone:current-tripleo
Trying to pull docker.io/tripleomaster/centos-binary-keystone:current-tripleo...
Getting image source signatures
...
Copying config 9e85172eba done
Writing manifest to image destination
Storing signatures
9e85172eba10a2648ae7235076ada77b095ed3da05484916381410135cc8884c

Since I did this as a normal account, and not as root, the image does not get stored under /var, but instead goes somewhere under $HOME/.local. If I type

$ podman images
REPOSITORY                                       TAG               IMAGE ID       CREATED        SIZE
docker.io/tripleomaster/centos-binary-keystone   current-tripleo   9e85172eba10   2 days ago     904 MB

I can see the short form of the hash starting with 9e85. I copy that to use to match the subdir under ls /home/ayoung/.local/share/containers/storage/overlay-image

ls /home/ayoung/.local/share/containers/storage/overlay-images/9e85172eba10a2648ae7235076ada77b095ed3da05484916381410135cc8884c/

If I cat that file, I can see all of the layers that make up the image itself.

Trying a naive: podman run docker.io/tripleomaster/centos-binary-keystone:current-tripleo I get an error that shows just how kolla-centric this image is:

$ podman run docker.io/tripleomaster/centos-binary-keystone:current-tripleo
+ sudo -E kolla_set_configs
INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
ERROR:__main__:Unexpected error:
Traceback (most recent call last):
  File "/usr/local/bin/kolla_set_configs", line 412, in main
    config = load_config()
  File "/usr/local/bin/kolla_set_configs", line 294, in load_config
    config = load_from_file()
  File "/usr/local/bin/kolla_set_configs", line 282, in load_from_file
    with open(config_file) as f:
IOError: [Errno 2] No such file or directory: '/var/lib/kolla/config_files/config.json'

So I read the docs. Trying to fake it with:

$ podman run -e KOLLA_CONFIG='{}'   docker.io/tripleomaster/centos-binary-keystone:current-tripleo
+ sudo -E kolla_set_configs
INFO:__main__:Validating config file
ERROR:__main__:InvalidConfig: Config is missing required "command" key

When running with TripleO, The config files are generated from Heat Templates. The values for the config.json come from here.
This gets me slightly closer:

podman run  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -e KOLLA_CONFIG='{"command": "/usr/sbin/httpd"}'   docker.io/tripleomaster/centos-binary-keystone:current-tripleo

But I still get an error of “no listening sockets available, shutting down” even if I try this as Root. Below is the whole thing I tried to run.

$ podman run   -v $PWD/fernet-keys:/var/lib/kolla/config_files/src/etc/keystone/fernet-keys   -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -e KOLLA_CONFIG='{ "command": "/usr/sbin/httpd", "config_files": [ { "source": "/var/lib/kolla/config_files/src/etc/keystone/fernet-keys", "dest": "/etc/keystone/fernet-keys", "owner":"keystone", "merge": false, "perm": "0600" } ], "permissions": [ { "path": "/var/log/kolla/keystone", "owner": "keystone:keystone", "recurse": true } ] }'  docker.io/tripleomaster/centos-binary-keystone:current-tripleo

Lets go back to simple things. What is inside the container? We can peek using:

$
podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls

Basically, we can perform any command that will not last longer than the failed kolla initialization. No Bash prompts, but shorter single line bash commands work. We can see that mysql is uninitialized:

 podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo cat /etc/keystone/keystone.conf | grep "connection ="
#connection = 

What about those config files that the initialization wants to copy:

podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls /var/lib/kolla/config_files/src/etc/httpd/conf.d
ls: cannot access /var/lib/kolla/config_files/src/etc/httpd/conf.d: No such file or directory

So all that comes from external to the container, and is mounted at run time.

$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo cat /etc/passwd  | grep keystone
keystone:x:42425:42425::/var/lib/keystone:/usr/sbin/nologin

Which owns the config and the log files.

$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls -la /var/log/keystone
total 8
drwxr-x---. 2 keystone keystone 4096 Dec 17 08:28 .
drwxr-xr-x. 6 root     root     4096 Dec 17 08:28 ..
-rw-rw----. 1 root     keystone    0 Dec 17 08:28 keystone.log
$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo ls -la /etc/keystone
total 128
drwxr-x---. 2 root     keystone   4096 Dec 17 08:28 .
drwxr-xr-x. 2 root     root       4096 Dec 19 16:30 ..
-rw-r-----. 1 root     keystone   2303 Nov 12 02:15 default_catalog.templates
-rw-r-----. 1 root     keystone 104220 Dec 14 01:09 keystone.conf
-rw-r-----. 1 root     keystone   1046 Nov 12 02:15 logging.conf
-rw-r-----. 1 root     keystone      3 Dec 14 01:09 policy.json
-rw-r-----. 1 keystone keystone    665 Nov 12 02:15 sso_callback_template.html
$ podman run  docker.io/tripleomaster/centos-binary-keystone:current-tripleo cat /etc/keystone/policy.json
{}

Yes, policy.json is empty.

Lets go back to the config file. I would rather not have to pass in all the config info as an environment variable each time. If I run as root, I can use the podman bind-mount option to relabel it:

 podman run -e KOLLA_CONFIG_FILE=/config.json  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -v $PWD/config.json:/config.json:z   docker.io/tripleomaster/centos-binary-keystone:current-tripleo  

This eventually fails with the error message “no listening sockets available, shutting down” Which seems to be due to the lack of the httpd.conf entries for keystone:

# podman run -e KOLLA_CONFIG_FILE=/config.json  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -v $PWD/config.json:/config.json:z   docker.io/tripleomaster/centos-binary-keystone:current-tripleo  ls /etc/httpd/conf.d
auth_mellon.conf
auth_openidc.conf
autoindex.conf
README
ssl.conf
userdir.conf
welcome.conf

The clue seems to be in the Heat Templates. There are a bunch of files that are expected to be in /var/lib/kolla/config_files/src in side the container. Here’s my version of the WSGI config file:

Listen 5000
Listen 35357

ServerSignature Off
ServerTokens Prod
TraceEnable off

ErrorLog /var/log/kolla/keystone/apache-error.log"

    CustomLog /var/log/kolla/keystone/apache-access.log" common


LogLevel info


    
        AllowOverride None
        Options None
        Require all granted
    




    WSGIDaemonProcess keystone-public processes=5 threads=1 user=keystone group=keystone display-name=%{GROUP} python-path=/usr/lib/python2.7/site-packages
    WSGIProcessGroup keystone-public
    WSGIScriptAlias / /usr/bin/keystone-wsgi-public
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
    = 2.4>
      ErrorLogFormat "%{cu}t %M"
    
    ErrorLog "/var/log/kolla/keystone/keystone-apache-public-error.log"
    LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"" logformat
    CustomLog "/var/log/kolla/keystone/keystone-apache-public-access.log" logformat



    WSGIDaemonProcess keystone-admin processes=5 threads=1 user=keystone group=keystone display-name=%{GROUP} python-path=/usr/lib/python2.7/site-packages
    WSGIProcessGroup keystone-admin
    WSGIScriptAlias / /usr/bin/keystone-wsgi-admin
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
    = 2.4>
      ErrorLogFormat "%{cu}t %M"
    
    ErrorLog "/var/log/kolla/keystone/keystone-apache-admin-error.log"
    LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\"" logformat
    CustomLog "/var/log/kolla/keystone/keystone-apache-admin-access.log" logformat

So with a directory structure like this:

C[root@ayoungP40 kolla]find src/ -print
src/
src/etc
src/etc/keystone
src/etc/keystone/fernet-keys
src/etc/keystone/fernet-keys/1
src/etc/keystone/fernet-keys/0
src/etc/httpd
src/etc/httpd/conf.d
src/etc/httpd/conf.d/wsgi-keystone.conf

And a Kolla config.json file like this:

{
   "command": "/usr/sbin/httpd",
   "config_files": [
        {
              "source": "/var/lib/kolla/config_files/src/etc/keystone/fernet-keys",
              "dest": "/etc/keystone/fernet-keys",
              "merge": false,
              "preserve_properties": true
        },{
              "source": "/var/lib/kolla/config_files/src/etc/httpd/conf.d",
              "dest": "/etc/httpd/conf.d",
              "merge": false,
              "preserve_properties": true
        },{  
              "source": "/var/lib/kolla/config_files/src/*",
              "dest": "/",
              "merge": true,
              "preserve_properties": true
        }
    ],
    "permissions": [
	    {
            "path": "/var/log/kolla/keystone",
            "owner": "keystone:keystone",
            "recurse": true
        }
    ]
}

I can run Keystone like this:

podman run -e KOLLA_CONFIG_FILE=/config.json  -e KOLLA_CONFIG_STRATEGY=COPY_ONCE   -v $PWD/config.json:/config.json:z   -v $PWD/src:/var/lib/kolla/config_files/src:z  docker.io/tripleomaster/centos-binary-keystone:current-tripleo

by Adam Young at December 19, 2019 09:00 PM

OpenStack Superuser

The 10th China Open Source Hackathon Recap: Projects, Talks, and More

This week, the 10th China Open Source Hackathon was held in Beijing. Since its first event in 2015, the China Open Source Hackathon has been held ten times, and this week’s event featured OpenStack, Kata Containers and StarlingX. Although it snowed heavily in Beijing this week, it did not cool down the developers’ enthusiasm for this Hackathon. Without further ado, Superuser collected the activities that you might have missed.

Kata Containers:

At this Hackathon, five developers from Ant Financial and Alibaba demonstrated two important features of the coming 2.0 dev cycle. Among them, Tao Peng and Eryu Guan demoed a mirroring system named Nydus that they have designed. Nydus uses the new developments of OCI artifacts and virtio-fs, combined with the OCI mirroring community’s future evolution direction, and considers isolation, pull speed, memory efficiency, and more to provide reference for the mirroring design of Kata 2.0.

The Kata Containers developers also coded and modified Kata at the Hackathon to support parsing the Nydus rootfs mount format, so it can achieve the extremely fast startup of Kata Containers. In addition, aiming to reduce the resource consumption requirements of Kata, Hui Zhu, Bo Yang, and Fupan Li replaced GRPC with rust-ttrpc on the basis of reimplementing kata-agent with rust, and made corresponding modifications to the kata runtime. They also demoed how to start Kata Containers with a combination of cloud hypervisor + rust kata-agent + ttrpc, and compared it with the current 1.X version of kata (a combination of qemu + go kata-agent + grpc). The original kata-agent implemented with go language and GRPC when running anonymous pages consumed about 11M at runtime, but the kata-agent implemented with rust language and TTRPC consumed only about 500,000, which greatly reduces the consumption of memory resources by Kata Containers itself.

OpenStack:

As the project featured since the first China Open Source Hackathon, OpenStack has participated in this event ten times. The enthusiasm of the OpenStack Cinder, Cyborg and Nova project developers is still have an affect everyone. Since this Hackathon happened at the beginning stage of the OpenStack Ussuri release, OpenStack developers not only reviewed bugs and submitted patches, but also had some discussion on the points from the spec that will help people propose their feature in Ussuri.

Another spotlight on OpenStack at this Hackathon is OpenStack Tricircle. On the first day of the Hackathon, the presentation, delivered by Professor Fangming Liu from the Huazhong University of Science and Technology helped attendees learn more about OpenStack Tricircle. OpenStack Tricircle provides networking automation across Neutron servers in multi-region OpenStack clouds. The clouds are supported by geo-distributed datacenters and deployed in multiple regions. The OpenStack Tricircle team also has collaborated with Huawei as well as other members in the OpenStack community.

StarlingX

In this two-day Hackathon, the StarlingX community cleared out the path of Ceph containerization for StarlingX, which made a big step toward Cloud Native. To continuously enhance StarlingX network manageability, StarlingX is looking at the feasibility of integrating SDN solutions.

Similar to the previous China Open Source Hackathon, the StarlingX team organized a mini meetup and technical discussion on StarlingX 4.0 feature open discussion for Ceph containerization and small node Blueprint spec update. The StarlingX team had a tech discussion with the Juniper Network team who shared Tungsten Fabric SDN feature sets, architecture and a few BGP VPN solutions (such as VSNX, CSNX). The StarlingX developers helped the JITStack team as new community contributors, and they also had a discussion with the China Unicom Wo Cloud team for their Industry Edge solution enabling community roadmap alignment.

This week, the StarlingX community not only officially released the StarlingX 3.0 release on Monday, but also won this year’s China Excellent Open Source Project Award at the 9th China Cloud Computing Standards and Application Conference. Congratulations to the StarlingX community and the community members who contributed code! 

The post The 10th China Open Source Hackathon Recap: Projects, Talks, and More appeared first on Superuser.

by Sunny Cai at December 19, 2019 07:33 PM

December 18, 2019

Adam Young

keystone-db-init in OpenShift

Before I can run Keystone in a container, I need to initialize the database. This is as true for running in Kubernetes as it was using podman. Here’s how I got keystone-db-init to work.

The general steps were:

  • use oc new-app to generate the build-config and build
  • delete the deployment config generated by new-app
  • upload a secret containing keystone.conf
  • deploy a pod that uses the image built above and the secret version of keystone.conf to run keystone-manage db_init
oc delete deploymentconfig.apps.openshift.io/keystone-db-in

To upload the secret.

kubectl create secret generic keystone-conf --from-file=../keystone-db-init/keystone.conf

Here is the yaml definition for the pod

apiVersion: v1
kind: Pod
metadata:
  name: keystone-db-init-pod
  labels:
    app: myapp
spec:
  containers:
  - image: image-registry.openshift-image-registry.svc:5000/keystone/keystone-db-init
    imagePullPolicy: Always
    name: keystone-db-init
    volumeMounts:
    - name: keystone-conf
      mountPath: "/etc/keystone/"
  volumes:
  - name: keystone-conf
    secret:
      secretName: keystone-conf
      items:
      - key: keystone.conf
        path: keystone.conf
        mode: 511       
    command: ['sh', '-c', 'cat /etc/keystone/keystone.conf']

While this is running as the keystone unix account, I am not certain how that happened. I did use the patch command I talked about earlier on the deployment config, but you can see I am not using that in this pod. That is something I need to straighten out.

To test that the database was initialized:

$ oc get pods -l app=mariadb-keystone
NAME                       READY   STATUS    RESTARTS   AGE
mariadb-keystone-1-rxgvs   1/1     Running   0          9d
$ oc rsh mariadb-keystone-1-rxgvs
sh-4.2$ mysql -h mariadb-keystone -u keystone -pkeystone keystone
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 908
Server version: 10.2.22-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [keystone]> show tables;
+------------------------------------+
| Tables_in_keystone                 |
+------------------------------------+
| access_rule                        |
| access_token                       |
....
+------------------------------------+
46 rows in set (0.00 sec)

I’ve fooled myself in the past thinking that things have worked when they have note. To make sure I am not doing that now, I dropped the keystone database and recreated it from insider the mysql monitor program. I then re-ran the pod, and was able to see all of the tables.

by Adam Young at December 18, 2019 08:48 PM

December 17, 2019

OpenStack Superuser

Spreading OpenStack in Roots of Education: Open Infra Institute Day Kollam (Kerala)

Kollam, India—Bangalore based OpenStack enthusiasts and members of OpenTech Foundation gathered together for OpenStack Technology Institute Day in Amritha University campus in Kollam, Kerala, India. It was well-received by many students and raised interesting queries which Dell’s Principal Architect Prakash Ramchandran and VMware’s Dr. Ganesh Hiregoudar answered. Along with both of them, Institute Day was driven by Professor Vipin with FOSS updates and Calsoft‘s Digambar Patil was around to mentor students for upstream contributions including setting up the environment, submit patches, etc.

Here are some of the glimpses of event.

About OpenStack Technology Institute Day: 

OpenStack Technology Institute Day is a program to share knowledge about the different ways of contributing to OpenStack like providing new features, writing documentation, participating in working groups, and so forth. The educational program is built on the principle of open collaboration and will teach the students how to find information and navigate the intricacies of the project’s technical tools and social interactions in order to get their contributions accepted. The training is focusing on hands-on practice like the students can use a prepared development environment to learn how to test, prepare and upload new code snippets or documentation for review. The attendees are also given the opportunity to join a mentoring program to get further help and guidance on their journey to become an active and successful member of the OpenStack community.

About the author

Sagar Nangare is a technology blogger, focusing on data center technologies (networking, telecom, cloud, storage) and emerging domains like edge computing, IoT, machine learning, AI). He works at Calsoft Inc. as a digital strategist.

The post Spreading OpenStack in Roots of Education: Open Infra Institute Day Kollam (Kerala) appeared first on Superuser.

by Sagar Nangare at December 17, 2019 08:00 AM

December 16, 2019

RDO

Community Blog Round Up 16 December 2019

We’re super chuffed that there’s already another article to read in our weekly blog round up – as we said before, if you write it, we’ll help others see it! But if you don’t write it, well, there’s nothing to set sail. Let’s hear about your latest adventures on the Ussuri river and if you’re NOT in our database, you CAN be by creating a pull request to https://github.com/redhat-openstack/website/blob/master/planet.ini.

Reading keystone.conf in a container by Adam Young

Step 3 of the 12 Factor app is to store config in the environment. For Keystone, the set of configuration options is controlled by the keystone.conf file. In an earlier attempt at containerizing the scripts used to configure Keystone, I had passed an environment variable in to the script that would then be written to the configuration file. I realize now that I want the whole keystone.conf external to the application. This allow me to set any of the configuration options without changing the code in the container. More importantly, it allows me to make the configuration information immutable inside the container, so that the applications cannot be hacked to change their own configuration options.

Read more at https://adam.younglogic.com/2019/12/reading-keystone-conf-in-a-container/

by Rain Leander at December 16, 2019 11:45 AM

December 12, 2019

OpenStack Superuser

Ethernet VPN Deployment Automation with OpenStack and ODL Controller

The cool thing about OpenStack is – its tight integration with SDN solutions like OpenDaylight to keep apart network traffic, on-demand scaling and enabling centralized control on geographically distributed data centers. In this article, we will talk about a proposed SDN based architecture in which how OpenStack and OpenDaylight can be used to automate the deployment of VPN instances (Ethernet VPN in this case), centrally manage them along with regular updates network policies and enhancement in terms of scalability and response time on VPNs.

Problem with Interconnection of data centers with L2VPN

Virtual Private Network is generally used for geographically distributed data center interconnection. There were a lot of generations of VPN technologies that were introduced to address the connectivity needs between different sites. Layer -2 VPN (L2VPN) is the one that is widely used by organizations due to its flexibility and transparency. Virtual Private Lan Service (VPLS) service is used by L2VPN to connect different data centers. The main advantage of VPLS is that it can extend the VLAN to data centers. But VPLS has its own barriers in terms of redundancy, scalability, flexibility, and limited forwarding policies. However, Internet Service Providers (ISPs) use Multiprotocol Label Switching (MPLS) for data center interconnection because of its flexibility and ease in deployment. That triggers the necessity to have VPN technology designed for MPLS. This is where Ethernet VPN (EVPN) comes in, that address concerns and challenges associated with using VPN with MPLS. EVPN simple enables an L2 VPN connection over MPLS.

The core problem with EVPN was with manual configuration and management of EPVN instances which can cause huge time consumption, error-prone configuration and high OPEX.

An SDN Based Solution

To address the problem, SDN based architecture was proposed by researchers and engineers from Karlstd University and Ericsson. It utilized OpenDaylight SDN controller and OpenStack for automated remote deployment and automation of EVPN related tasks.

The offered solution in this paper mainly reduces two existing limitations. One is flexible network management automation and other is control plane complexity of MPLS based VPN and provision of flexibility for adding new network changes.

Architecture

Before we dive into the architecture, let’s talk about how EVPN is a key technology for this solution to run EVPN dynamically on MPLS. EVPN uses MP-BGP in its control plane as a signaling method to broadcast addresses that removes the need of traditional flood-and-learn in the data plane. In EVPN, the control and data plane are abstracted and separated. That allows MPLS and Provider Edge Backbone Bridge to be used together with the EVPN control plane.

SDN Based EVPN Deployment Automation Architecture

The above architecture depicts the model-driven network management and automation of EVPN instances. In this model a YANG data modeling language is used to define services and configurations, represent state data and process notifications. A configuration data defined in YANG file transmitted to network devices. NETCONF protocol is used to for transmission of configuration along with installation, deletion, and manipulation of configuration of network devices. Transmitted messages are encoded in XML file. NETCONF admin help data to pass through, validate the configuration and after successful execution admin commit changes to network devices. SDN controller leverages the NETCONF for automating the configuration of EVIs on provider edge routers.

Let’s understand the role of key components in the architecture

OpenStack: It is used as a central cloud platform to orchestrate the management of EVPNs using SDN controller. OpenStack Neutron project API is used to communicate with ODL SDN controller to manage EVPN instances attached in network.

OpenDaylight SDN Controller: It is the core element of this architecture which extends the Multiprotocol Border Gatway Protocol (MP-BGP) inside OpenDaylight controller with MP-BGP control plane (EVPN instances on the provider edge/data center) and the VPNService inside the OpenDaylight controller that automates EVPN configuration using YANG and NETCONF. This bypasses the slow and error-prone tasks of manual EVPN configuration.

Open vSwtich (OVS): This switch sits inside OpenStack compute nodes. It is used to isolate the traffic among different VMs and connects them to the physical network.

Provider Edge (PE) routers: The PE acts as a middleware for the data centers and supports EVPN and MP-BGP extensions as well as NETCONF and YANG.

Above architecture solution is evaluated. You can refer to the paper for test results here.

 

The post Ethernet VPN Deployment Automation with OpenStack and ODL Controller appeared first on Superuser.

by Sagar Nangare at December 12, 2019 02:00 PM

Adam Young

Reading keystone.conf in a container

Step 3 of the 12 Factor app is to store config in the environment. For Keystone, the set of configuration options is controlled by the keystone.conf file. In an earlier attempt at containerizing the scripts used to configure Keystone, I had passed an environment variable in to the script that would then be written to the configuration file. I realize now that I want the whole keystone.conf external to the application. This allow me to set any of the configuration options without changing the code in the container. More importantly, it allows me to make the configuration information immutable inside the container, so that the applications cannot be hacked to change their own configuration options.

I was running the pod and mounting the local copy I had of the keystone.conf file using this command line:

podman run --mount type=bind,source=/home/ayoung/devel/container-keystone/keystone-db-init/keystone.conf,destination=/etc/keystone/keystone.conf:Z --add-host keystone-mariadb:10.89.0.47   --network maria-bridge  -it localhost/keystone-db-init 

It was returning with no output. To diagnose, I added on /bin/bash to the end of the command so I could poke around inside the running container before it exited.

podman run --mount /home/ayoung/devel/container-keystone/keystone-db-init/keystone.conf:/etc/keystone/keystone.conf    --add-host keystone-mariadb:10.89.0.47   --network maria-bridge  -it localhost/keystone-db-init /bin/bash

Once inside, I was able to look at the keystone log file. A Stack trasce made me realize that I was not able to actually read the file /etc/keystone/keystone.conf. Using ls I would show up like this:

-?????????? ? ?        ?             ?            ? keystone.conf:

It took a lot of trial and error to recitify it including:

  • adding a parallel entry to my hosts /etc/password and /etc/groups file for the keystone user and group
  • Ensuring that the file was owned by keystone outside the container
  • switching to the -v option to create the bind mount, as that allowed me to use the :Z option as well.
  • addingthe -u keystone option to the command line

The end command looked like this:

podman run -v /home/ayoung/devel/container-keystone/keystone-db-init/keystone.conf:/etc/keystone/keystone.conf:Z  -u keystone         --add-host keystone-mariadb:10.89.0.47   --network maria-bridge  -it localhost/keystone-db-init 

Once I had it correct, I could use the /bin/bash executable to again poke around inside the container. From the inside, I could run:

$ keystone-manage db_version
109
$ mysql -h keystone-mariadb -ukeystone -pkeystone keystone  -e "show databases;"
+--------------------+
| Database           |
+--------------------+
| information_schema |
| keystone           |
+--------------------+

Next up is to try this with OpenShift.

by Adam Young at December 12, 2019 12:09 AM

December 10, 2019

OpenStack Superuser

Unleashing the OpenStack “Train”: Contribution from Intel and Inspur

The OpenStack community released the latest version, “Train”, on October 16th. As Platinum and Gold members of OpenStack Foundation, Intel and Inspur OpenStack teams are actively contributing to the community projects, such as Nova, Neutron, Cinder, Cyborg, and others. During the Train development cycle, both companies collaborated, contributed to and completed multiple achievements. This includes 4 blueprints and design specifications in Train, commits, reviews and more, and reflects the high level of contribution to the development of OpenStack code base.

In early September 2019, Intel and Inspur worked together and used the InCloud OpenStack 5.6 (ICOS 5.6) to validate a single cluster deployment with 200 and 500 nodes. This created a solid foundational reference architecture for OpenStack in a large-scale single cluster environment. Intel and Inspur closely monitor the latest development updates in the community and upgraded ICOS5.6 to support new features of Train. For example, while validating the solution, a networking bottleneck issue (Neutron IPAM DLM and IP address allocation) was found in a large-scale high concurrency provisioning scenario (e.g. >800 VM creation). After applying a distributed lock solution with etcd, the network creation process was optimized and significantly improved system performance. The team also worked on Nova project to provide “delete on termination” feature for VM volumes. This greatly improves operation efficiency for cloud administrators. Another important new feature “Nova VPMEM” is also included in OpenStack “Train” release. This feature can guarantee persistent data storage functionality across power cycles, at a lower cost and larger capacity compared to DRAM. This can significantly improve workload performance for applications such as Redis, Rocksdb, SAP HANA, Aerospike, etc.

Intel and Inspur shared many of the engineering best practices at the recent Shanghai Open Infrastructure Summit, including resources for 500 node large-scale cluster deployment in relevant sessions such as “full stack security chain of trust and best practices in cloud”, “improving private cloud performance for big data analytics workloads”, and more.

Chief Architect of Intel Data Center Group, Enterprise & Government for China Division, Dr Yih Leong Sun said: Intel is actively contributing to the OpenStack upstream community and will continue to improve OpenStack architecture with Intel’s latest technology. We strive to build a software defined infrastructure, optimized at both the software and hardware layer, and to deliver an Open Cloud solution that meets the workload performance requirements of the industry.

Vice President of Inspur Group, Zhang Dong indicated: Inspur is increasingly investing more on upstream community and contributing our knowledge and experience with industry deployment and usage. We continue to strengthen our technical leadership and contribution in the community, to help users solve real-world challenges, and to promote the OpenStack adoption.

 

Photo // CC BY NC

The post Unleashing the OpenStack “Train”: Contribution from Intel and Inspur appeared first on Superuser.

by Brin Zhang and Lily Wu at December 10, 2019 08:00 AM

December 09, 2019

RDO

Community Blog Round Up 09 December 2019

As we sail down the Ussuri river, Ben and Colleen report on their experiences at Shanghai Open Infrastructure Summit while Adam dives into Buildah.

Let’s Buildah Keystoneconfig by Adam Young

Buildah is a valuable tool in the container ecosystem. As an effort to get more familiar with it, and to finally get my hand-rolled version of Keystone to deploy on Kubernetes, I decided to work through building a couple of Keystone based containers with Buildah.

Read more at https://adam.younglogic.com/2019/12/buildah-keystoneconfig/

Oslo in Shanghai by Ben Nemec

Despite my trepidation about the trip (some of it well-founded!), I made it to Shanghai and back for the Open Infrastructure Summit and Project Teams Gathering. I even managed to get some work done while I was there. 🙂

Read more at http://blog.nemebean.com/content/oslo-shanghai

Shanghai Open Infrastructure Forum and PTG by Colleen Murphy

The Open Infrastructure Summit, Forum, and Project Teams Gathering was held last week in the beautiful city of Shanghai. The event was held in the spirit of cross-cultural collaboration and attendees arrived with the intention of bridging the gap with a usually faraway but significant part of the OpenStack community.

Read more at http://www.gazlene.net/shanghai-forum-ptg.html

by Rain Leander at December 09, 2019 12:24 PM

December 06, 2019

OpenStack Superuser

A Guide to Kubernetes Etcd: All You Need to Know to Set up Etcd Clusters

We all know Kubernetes is a distributed platform that orchestrates different worker nodes and can be controlled by central master nodes. There can be ‘n’ number of worker nodes that can be distributed to handle pods. To keep track of all changes and updates of these nodes and pass on the desired action, Kubernetes uses etcd.

What is etcd in Kubernetes?

Etcd is a distributed reliable key-value store which is simple, fast and secure. It acts like a backend service discovery and database, runs on different servers in Kubernetes clusters at the same time to monitor changes in clusters and to store state/configuration data that should to be accessed by a Kubernetes master or clusters. Additionally, etcd allows Kubernetes master to support discovery service so that deployed application can declare their availability for inclusion in service.

The API server component in Kubernetes master nodes communicates with etcd the components spread across different clusters. Etcd is also useful to set up the desired state for the system.

By means of key-value store for Kubernetes etcd, it stores all configurations for Kubernetes clusters. It is different than traditional database which stores data in tabular form. Etcd creates a database page for each record which do not hampers other records while updating one. For example, this might happen that few records may require additional columns, but those not required by other records in the same database. This creates redundancy within database. Etcd adds and manages all records in reliable way for Kubernetes.

Distributed and Consistent

Etcd stores a critical data for Kubernetes. By means of distributed, it also maintains a copy of data stores across all clusters on distributed machines/servers. This copy is identical for all data stores and maintains the same data from all other etcd data stores. If one copy get destroys, the other two hold the same information.

Deployment Methods for etcd in Kubernetes Clusters

Etcd is implementation is architected in such a way to enable high availability in Kubernetes. Etcd can be deployed as pods in master nodes

Figure – etcd in the same cluster

Image source: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

It can also be deployed externally to enable resiliency and security

Figure – etcd deployed externally

Image source: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/

How etcd Works

Etcd acts as the brain of the Kubernetes cluster. Monitoring the sequence of changes is done using the ‘Watch’ function of etcd. With this function, Kubernetes can subscribe to changes within clusters and execute any state request coming from API server. Etcd co-ordinates with different components within distributed clusters. Etcd reacts to changes with state of components and other components may get reacted to changes.

There might be a situation – while maintaining the same copy of all state among group of etcd components in clusters, the same data needs to be stored in two etcd instances. However, etcd is not supposed to update the same record in different instances. 

In such cases, etcd does not process the writes on each cluster node. Instead, only one of the instances gets the responsibility to process the writes internally. That node is called leader. The other nodes in cluster elect a leader using RAFT algorithm. Once the leader get elected, the other node becomes the followers of the leader. 

Now, when the write requests came to the leader node then the leader processes the write. The leader etcd node broadcasts a copy of the data to other nodes. If one of the follower nodes is not active or offline that moment, based on the majority of available nodes write requests get a complete flag. Normally, the write gets the complete flag if the leader gets consent from the other members in the cluster. 

This is the way they elect the leader among themselves and how do they ensure a write is propagated across all instances. This distributed consensus is implemented in etcd using raft protocol.

How Clusters Work in etcd 

Kubernetes is the main consumer for etcd project, initiated by CoreOS. Etcd has become a norm for functionality and overall tracking of Kubernetes cluster pods. Kubernetes allows various cluster architectures that may involve etcd as a crucial component or might involve multiple master nodes along with etcd as isolated component. 

The role of etcd changes per system configuration in any particular architecture. Such dynamic placement of etcd to manage clusters can be implemented to improve scaling. The result is easily supported and managed workloads. 

Here are the steps for initiating etcd in Kubernetes.

Wget the etcd files:

wget -q --show-progress --https-only --timestamping \ "https://github.com/etcd-io/etcd/releases/download/v3.4.0/etcd-v3.4.0-linux-amd64.tar.gz"

Tar and install the etcd server and the etcdctl tools:

{  

tar -xvf etcd-v3.4.0-linux-amd64.tar.gz  

sudo mv etcd-v3.4.0-linux-amd64/etcd* /usr/local/bin/

}

{  

sudo mkdir -p /etc/etcd /var/lib/etcd  

sudo cp ca.pem kubernetes-key.pem kubernetes.pem /etc/etcd/

}

Get the internal IP address for the current compute instance. It will be will be used to deal with client requests and data transmission with etcd cluster peers.:

INTERNAL_IP=$(curl -s -H "Metadata-Flavor: Google" \  http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip)

 Place the unique name for etcd to match the hostname of the current compute instance:

ETCD_NAME=$(hostname -s)

Create the etcd.service systemd unit file:

cat <<EOF | sudo tee /etc/systemd/system/etcd.service[Unit]

Description=etcd

Documentation=https://github.com/coreos

[Service]

Type=notify

ExecStart=/usr/local/bin/etcd \\  

--name ${ETCD_NAME} \\  

--cert-file=/etc/etcd/kubernetes.pem \\  

--key-file=/etc/etcd/kubernetes-key.pem \\  

--peer-cert-file=/etc/etcd/kubernetes.pem \\  

--peer-key-file=/etc/etcd/kubernetes-key.pem \\  

--trusted-ca-file=/etc/etcd/ca.pem \\  

--peer-trusted-ca-file=/etc/etcd/ca.pem \\  

--peer-client-cert-auth \\  

--client-cert-auth \\  

--initial-advertise-peer-urls https://${INTERNAL_IP}:2380 \\  

--listen-peer-urls https://${INTERNAL_IP}:2380 \\  

--listen-client-urls https://${INTERNAL_IP}:2379,https://127.0.0.1:2379 \\  

--advertise-client-urls https://${INTERNAL_IP}:2379 \\  

--initial-cluster-token etcd-cluster-0 \\  

--initial-cluster controller-0=https://10.240.0.10:2380,controller-1=https://10.240.0.11:2380,controller-2=https://10.240.0.12:2380 \\  

--initial-cluster-state new \\  

--data-dir=/var/lib/etcd

Restart=on-failure

RestartSec=5

[Install]

WantedBy=multi-user.target

EOF

 

Initiate etcd Server

{  

sudo systemctl daemon-reload  

sudo systemctl enable etcd  

sudo systemctl start etcd

}

Repeat above commands on: controller-0, controller-1, and controller-2.

List the etcd cluster members:

sudo ETCDCTL_API=3 etcdctl member list \  

--endpoints=https://127.0.0.1:2379 \  

--cacert=/etc/etcd/ca.pem \  

--cert=/etc/etcd/kubernetes.pem \  

--key=/etc/etcd/kubernetes-key.pem

Output:

3a57933972cb5131, started, controller-2, https://10.240.0.12:2380, https://10.240.0.12:2379

f98dc20bce6225a0, started, controller-0, https://10.240.0.10:2380, https://10.240.0.10:2379

ffed16798470cab5, started, controller-1, https://10.240.0.11:2380, https://10.240.0.11:2379

Conclusion

Etcd is an independent project at its core. But, it has been used extensively by the Kubernetes community to provide various benefits for managing states of clusters, enabling further automation for dynamic workloads. The key benefit for using Kubernetes with etcd is that, etcd is itself a distributed database that co-align with distributed Kubernetes clusters. So, using etcd with Kubernetes is vital for the health of the clusters.

About the author

Sagar Nangare is a technology blogger, focusing on data center technologies (networking, telecom, cloud, storage) and emerging domains like edge computing, IoT, machine learning, AI). He works at Calsoft Inc. as a digital strategist.

The post A Guide to Kubernetes Etcd: All You Need to Know to Set up Etcd Clusters appeared first on Superuser.

by Sagar Nangare at December 06, 2019 01:00 PM

December 03, 2019

Adam Young

Let’s Buildah Keystoneconfig

Buildah is a valuable tool in the container ecosystem. As an effort to get more familiar with it, and to finally get my hand-rolled version of Keystone to deploy on Kubernetes, I decided to work through building a couple of Keystone based containers with Buildah.

First, I went with the simple approach of modifying my old Dockerfiles to a later release of OpenStack, and kick off the install using buildah. I went with Stein.

Why not Train? Because eventually I want to test 0 down time upgrades. More on that later

The buildah command was just:

 buildah bud -t keystone 

However, to make that work, I had to adjust the Dockerfile. Here is the diff:

diff --git a/keystoneconfig/Dockerfile b/keystoneconfig/Dockerfile
index 149e62f..cd5aa5c 100644
--- a/keystoneconfig/Dockerfile
+++ b/keystoneconfig/Dockerfile
@@ -1,11 +1,11 @@
-FROM index.docker.io/centos:7
+FROM docker.io/centos:7
 MAINTAINER Adam Young 
  
-RUN yum install -y centos-release-openstack-rocky &&\
+RUN yum install -y centos-release-openstack-stein &&\
     yum update -y &&\
     yum -y install openstack-keystone mariadb openstack-utils  &&\
     yum -y clean all
  
 COPY ./keystone-configure.sql /
 COPY ./configure_keystone.sh /
-CMD /configure_keystone.sh
\ No newline at end of file
+CMD /configure_keystone.sh

The biggest difference is that I had to specify the name of the base image without the “index.” prefix. Buildah is strictah (heh) in what it accepts.

I also updated the package to stein. When I was done, I had the following:

$ buildah images
REPOSITORY                 TAG      IMAGE ID       CREATED          SIZE
localhost/keystone         latest   e52d224fa8fe   13 minutes ago   509 MB
docker.io/library/centos   7        5e35e350aded   3 weeks ago      211 MB

What if I wanted to do these same things via manual steps? Following the advice from the community, I can translate from Dockerfile-ese to buildah. First, I can fetch the original image using the buildah from command:

container=$(buildah from docker.io/centos:7)
$ echo $container 
centos-working-container

Now Add things to the container. We don’t build a new layer with each command, so the && approach is not required. So for the yum installs:

buildah run $container yum install -y centos-release-openstack-stein
buildah run $container yum update -y
buildah run $container  yum -y install openstack-keystone mariadb openstack-utils
buildah run $container  yum -y clean all

To Get the files into the container, use the copy commands:

buildah copy $container  ./keystone-configure.sql / 
buildah copy $container ./configure_keystone.sh / 

The final steps: tell the container what command to run and commit it to an image.

buildah config --cmd /configure_keystone.sh $container
buildah commit $container keystone

What do we end up with?

$ buildah images
REPOSITORY                 TAG      IMAGE ID       CREATED              SIZE
localhost/keystone         latest   09981bc1e95a   About a minute ago   509 MB
docker.io/library/centos   7        5e35e350aded   3 weeks ago          211 MB

Since I have an old, hard-coded IP address for the MySQL server, it is going to fail. But lets see:

buildah run centos-working-container /configure_keystone.sh
2019-12-03T16:34:16.000691965Z: cannot configure rootless cgroup using the cgroupfs manager
Database

And there it hangs. We’ll work on that in a bit.

I committed the container before setting the author field. That should be a line like:
buildah config --author "ayoung@redhat.com"
to map line-to-line with the Dockerfile.

by Adam Young at December 03, 2019 04:43 PM

December 01, 2019

Thomas Goirand

Upgrading an OpenStack Rocky cluster from Stretch to Buster

Upgrading an OpenStack cluster from one version of OpenStack to another has become easier, thanks to the versioning of objects in the rabbitmq message bus (if you want to know more, see what oslo.versionedobjects is). But upgrading from Stretch to Buster isn’t easy at all, event with the same version of OpenStack (it is easier to be running OpenStack Rocky backports on Stretch and upgrade to Rocky on Buster, rather than upgrading OpenStack at the same time as the system).

The reason it is difficult, is because rabbitmq and corosync in Stretch can’t talk to the versions shipped in Buster. Also, in a normal OpenStack cluster deployment, services on all machines are constantly doing queries to the OpenStack API, and exchanging messages through the RabbitMQ message bus. One of the dangers, for example, would be if a Neutron DHCP agent could not exchange messages with the neutron-rpc-server. Your VM instances in the OpenStack cluster then could loose connectivity.

If a constantly online HA upgrade with no downtime isn’t possible, it is however possible to minimize down time to just a few seconds, if following a correct procedure. It took me more than 10 tries to be able to do everything in a smooth way, understanding and working around all the issues. 10 tries, means installing 10 times an OpenStack cluster in Stretch (which, even if fully automated, takes about 2 hours) and trying to upgrade it to Buster. All of this is very time consuming, and I haven’t seen any web site documenting this process.

This blog post intends to document such a process, to save the readers the pain of hours of experimentation.

Note that this blog post asserts you’re cluster has been deployed using OCI (see: https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer) however, it should also apply to any generic OpenStack installation, or even to any cluster running RabbitMQ and Corosync.

The root cause of the problem more in details: incompatible RabbitMQ and Corosync in Stretch and Buster

RabbitMQ in Stretch is version 3.6.6, and Buster has version 3.7.8. In theory, the documentation of RabbitMQ says it is possible to smoothly upgrade a cluster with these versions. However, in practice, the problem is the Erlang version rather than Rabbit itself: RabbitMQ in Buster will refuse to talk to a cluster running Stretch (the daemon will even refuse to start).

The same way, Corosync 3.0 in Buster will refuse to accept messages from Corosync 2.4 in Stretch.

Overview of the solution for RabbitMQ & Corosync

To minimize downtime, my method is to shutdown RabbitMQ on node 1, and let all daemons (re-)connect to node 2 and 3. Then we upgrade node 1 fully, and then restart Rabbit in there. Then we shutdown Rabbit on node 2 and 3, so that all daemons of the cluster reconnect to node 1. If done well, the only issue is if a message is still in the cluster of node 2 and 3 when daemons fail-over to node 1. In reality, this isn’t really a problem, unless there’s a lot of activity on the API of OpenStack. If this was the case (for example, if running a public cloud), then the advise would simply to firewall the OpenStack API for the short upgrade period (which shouldn’t last more than a few minutes).

Then we upgrade node 2 and 3 and make them join the newly created RabbitMQ cluster in node 1.

For Corosync, node 1 will not allow start the VIP resource before node 2 is upgraded and both nodes can talk to each other. So we just upgrade node 2, and turn off the VIP resource on node 3 immediately when it is up on node 1 and 2 (which happens during the upgrade of node 2).

The above should be enough reading for most readers. If you’re not that much into OpenStack, it’s ok to stop reading this post. For those who are move involved users of OpenStack on Debian deployed with OCI, let’s go more in details…

Before you start: upgrading OCI

In previous versions of OCI, the haproxy configuration was missing a “option httpcheck” for the MariaDB backend, and therefore, if a MySQL server on one node was going down, haproxy wouldn’t detect it, and the whole cluster could fail (re-)connecting to MySQL. As we’re going to bring some MySQL servers down, make sure the puppet-master is running with the latest version of puppet-module-oci, and that the changes have been applied in all OpenStack controller nodes.

Upgrading compute nodes

Before we upgrade the controllers, it’s best to start by compute nodes, which are the most easy to do. The easiest way is to live-migrate all VMs away from the machine before proceeding. First, we disable the node, so no new VM can be spawned on it:

openstack compute service set --disable z-compute-1.example.com nova-compute

Then we list all VMs on that compute node:

openstack server list –all-projects –host z-compute-1.example.com

Finally we migrate all VMs away:

openstack server migrate --live hostname-compute-3.infomaniak.ch --block-migration 8dac2f33-d4fd-4c11-b814-5f6959fe9aac

Now we can do the upgrade. First disable pupet, then tweak the sources.list, upgrade and reboot:

puppet agent --disable "Upgrading to buster"
apt-get remove python3-rgw python3-rbd python3-rados python3-cephfs librgw2 librbd1 librados2 libcephfs2
rm /etc/apt/sources.list.d/ceph.list
sed -i s/stretch/buster/g /etc/apt/sources.list
mv /etc/apt/sources.list.d/stretch-rocky.list /etc/apt/sources.list.d/buster-rocky.list
echo "deb http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main
deb-src http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main" >/etc/apt/sources.list/buster-rocky.list
apt-get update
apt-get dist-upgrade
reboot

Then we simply re-apply puppet:

puppet agent --enable ; puppet agent -t
apt-get purge linux-image-4.19.0-0.bpo.5-amd64 linux-image-4.9.0-9-amd64

Then we can re-enable the compute service:

openstack compute service set --enable z-compute-1.example.com nova-compute

Repeate the operation for all compute nodes, then we’re ready for the upgrade of controller nodes.

Removing Ceph dependencies from nodes

Most likely, if running with OpenStack Rocky on Stretch, you’d be running with upstream packages for Ceph Luminous. When upgrading to Buster, there’s no upstream repository anymore, and packages will use Ceph Luminous directly from Buster. Unfortunately, the packages from Buster are in a lower version than the packages from upstream. So before upgrading, we must remove all Ceph packages from upstream. This is what has been done just above for the compute nodes also. Upstream Ceph packages are easily identifiable, because upstream uses “bpo90” instead of what we do in Debian (ie: bpo9), so the operation can be:

apt-get remove $(dpkg -l | grep bpo90 | awk '{print $2}' | tr '\n' ' ')

This will remove python3-nova, which is fine as it is also running on the other 2 controllers. After switching the /etc/apt/sources.list to buster, Nova can be installed again.

In a normal setup by OCI, here’s the sequence of command that needs to be done:

rm /etc/apt/sources.list.d/ceph.list
sed -i s/stretch/buster/g /etc/apt/sources.list
mv /etc/apt/sources.list.d/stretch-rocky.list /etc/apt/sources.list.d/buster-rocky.list
echo "deb http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main
deb-src http://stretch-rocky.debian.net/debian buster-rocky-proposed-updates main" >/etc/apt/sources.list/buster-rocky.list
apt-get update
apt-get dist-upgrade
apt-get install nova-api nova-conductor nova-consoleauth nova-consoleproxy nova-placement-api nova-scheduler

You may notice that we’re replacing the Stretch Rocky backports repository by one for Buster. Indeed, even if all of Rocky is in Buster, there’s a few packages that are still pending for the review of the Debian stable release team before they can be uploaded to Buster, and we need the fixes for a smooth upgrade. See release team bugs #942201, #942102, #944594, #941901 and #939036 for more details.

Also, since we only did a “apt-get remove”, the Nova configuration in nova.conf must have stayed, and nova is already configured, so when we reinstall the services we removed when removing the Ceph dependencies, they will be ready to go.

Upgrading the MariaDB galera cluster

In an HA OpenStack cluster, typically, a Galera MariaDB cluster is used. That isn’t a problem when upgrading from Stretch to Buster, because the on-the-wire format stays the same. However, the xtrabackup library in Stretch is held by the MariaDB packages themselves, while in Buster, one must install the mariadb-backup. As a consequence, best is to simply turn off MariaDB in a node, do the Buster upgrade, install the mariadb-backup package, and restart MariaDB. To avoid that the MariaDB package attempts restarting the mysqld daemon, best is to mask the systemd unit:

systemctl stop mysql.service
systemctl disable mysql.service
systemctl mask mysql.service

Upgrading rabbitmq-server

Before doing anything, make sure all of your cluster is running with the python3-oslo.messaging version >= 8.1.4. Indeed, version 8.1.3 suffers from a bug where daemons would attempt reconnect constantly to the same server, instead of trying each of the servers described in the transport_url directive. Note that I’ve uploaded 8.1.4-1+deb10u1 to Buster, and that it is part of the 10.2 Buster point release. Though upgrading oslo.messaging will not restart daemons automatically: this must be done manually.

The strategy for RabbitMQ is to completely upgrade one node, start Rabbit on it, without any clustering, then shutdown the service on the other 2 node of the cluster. If this is performed fast enough, no message will be list in the message bus. However, there’s a few traps. Running “rabbitmqctl froget_cluster_node” only removes a node from the cluster for those who will still be running. It doesn’t remove the other nodes from the one which we want to upgrade. The way I’ve found to solve this is to simply remove the mnesia database of the first node, so that when it starts, RabbitMQ doesn’t attempt to cluster with the other 2 which are running a different version of Erlang. If it did, then it would just fail and refused to start.

However, there’s another issue to take care. When upgrading the 1st node to Buster, we removed Nova, because of the Ceph issue. Before we restart the RabbitMQ service on node 1, we need to install Nova, so that it will connect to either node 2 or 3. If we don’t do that, then Nova on node 1 may connect to the RabbitMQ service on node 1, which at this point, is a different RabbitMQ cluster than the one in node 2 and 3.

rabbitmqctl stop_app
systemctl stop rabbitmq-server.service
systemctl disable rabbitmq-server.service
systemctl mask rabbitmq-server.service
[ ... do the Buster upgrade fully ...]
[ ... reinstall Nova services we removed when removing Ceph ...]
rm -rf /var/lib/rabbitmq/mnesia
systemctl unmask rabbitmq-server.service
systemctl enable rabbitmq-server.service
systemctl start rabbitmq-server.service

At this point, since the node 1 RabbitMQ service was down, all daemons are connected to the RabbitMQ service on node 2 or 3. Removing the mnesia database removes all the credentials previously added to rabbitmq. If nothing is done, OpenStack daemons will not be able to connect to the RabbitMQ service on node 1. If like I do, one is using a config management system to populate the access rights, it’s rather easy: simply re-apply the puppet manifests, which will re-add the credentials. However, that isn’t enough: the RabbitMQ message queues are created when the OpenStack daemon starts. As I experienced, daemons will reconnect to the message bus, but will not recreate the queues unless daemons are restarted. Therefore, the sequence is as follow:

Do “rabbitmqctl start_app” on the first node. Add all credentials to it. If your cluster was setup with OCI and puppet, simply look at the output of “puppet agent -t –debug” to capture the list of commands to perform the credential setup.

Do a “rabbitmqctl stop_app” on both remaining nodes 2 and 3. At this point, all daemons will reconnect to the only remaining server. However, they wont be able to exchange messages, as the queues aren’t declared. This is when we must restart all daemons in one of the controllers. The whole operation normally doesn’t take more than a few seconds, which is how long your message bus wont be available. To make sure everything works, check the logs in /var/log/nova/nova-compute.log of one of your compute nodes to make sure Nova is able to report its configuration to the placement service.

Once all of this is done, there’s nothing to worry anymore about RabbitMQ, as all daemons of the cluster are connected to the service on node 1. However, one must make sure that, when upgrading node 2 and 3, they don’t reconnect to the message service on node 2 and 3. So best is to simply stop, disable and mask the service with systemd before continuing. Then, when restarting the Rabbit service on node 2 and 3, OCI’s shell script “oci-auto-join-rabbitmq-cluster” will make them join the new Rabbit cluster, and everything should be fine regarding the message bus.

Upgrading corosync

In an OpenStack cluster setup by OCI, 3 controllers are typically setup, serving the OpenStack API through a VIP (a Virtual IP). What we call a virtual IP is simply an IP address which is able to move from one node to another automatically depending on the cluster state. For example, with 3 nodes, if one goes down, one of the other 2 nodes will take over hosting the IP address which serves the OpenStack API. This is typically done with corosync/pacemaker, which is what OCI sets up.

The way to upgrade corosync is easier than the RabbitMQ case. The first node will refuse to start the corosync resource if it can’t talk to at least a 2nd node. Therefore, upgrading the first node is transparent until we touch the 2nd node: the openstack-api resource wont be started on the first node, so we can finish the upgrade in it safely (ie: take care of RabbitMQ as per above). The first thing to do is probably to move the resource to the 3rd node:

crm_resource --move --resource openstack-api-vip --node z-controller-3.example.com

Once the first node is completely upgraded, we upgrade the 2nd node. When it is up again, we can check the corosync status to make sure it is running on both node 1 and 2:

crm status

If we see the service is up on node 1 and 2, we must quickly shutdown the corosync resource on node 3:

crm resource stop openstack-api-vip

If that’s not done, then node 3 may also reclaim the VIP, and therefore, 2 nodes may it. If running with the VIP using L2 protocol, normally switches will connect only one of the machines declaring the VIP, so even if we don’t take care of it immediately, the upgrade should be smooth anyway. If, like I do in production, you’re running with BGP (OCI allows one to use BGP for the VIP, or simply use an IP on a normal L2 network), then the situation must be even better, as the peering router will continue to route to one of the controllers in the cluster. So no stress, this must be done, but no need to hurry as much as for the RabbitMQ service.

Finalizing the upgrade

Once node 1 and 2 are up, most of the work is done, and the 3rd node can be upgraded without any stress.

Recap of the procedure for controllers

  • Move all SNAT virtual routers running on node 1 to node 2 or 3 (note: this isn’t needed if the cluster has network nodes).
  • Disable puppet on node 1.
  • Remove all Ceph libraries from upstream on node 1, which also turn off some Nova services that runtime depend on them.
  • shutdown rabbitmq on node 1, including masking the service with systemd.
  • upgrade node 1 to Buster, fully. Then reboot it. This probably will trigger MySQL re-connections to node 2 or 3.
  • install mariadb-backup, start the mysql service, and make sure MariaDB is in sync with the other 2 nodes (check the log files).
  • reinstall missing Nova services on node 1.
  • remove the mnesia db on node 1.
  • start rabbitmq on node 1 (which now, isn’t part of the RabbitMQ cluster on node 2 and 3).
  • Disable puppet on node 2.
  • populate RabbitMQ access rights on node 1. This can be done by simply applying puppet, but may be dangerous if puppet restarts the OpenStack daemons (which therefore may connect to the RabbitMQ on node 1), so best is to just re-apply the grant access commands only.
  • shutdown rabbitmq on node 2 and 3 using “rabbitmqctl stop_app”.
  • quickly restart all daemons on one controller (for example the daemons on node 1) to declare message queues. Now all daemons must be reconnected and working with the RabbitMQ cluster on node 1 alone.
  • Re-enable puppet, and re-apply puppet on node 1.
  • Move all Neutron virtual routers from node 2 to node 1.
  • Make sure the RabbitMQ services are completely stopped on node 2 and 3 (mask the service with systemd).
  • upgrade node 2 to Buster (shutting down RabbitMQ completely, masking the service to avoid it restarts during upgrade, removing the mnesia db for RabbitMQ, and finally making it rejoin the newly node 1 single node cluster using oci-auto-join-rabbitmq-cluster: normally, puppet does that for us).
  • Reboot node 2.
  • When corosync on node 2 is up again, check corosync status to make sure we are clustering between node 1 and 2 (maybe the resource on node 1 needs to be started), and shutdown the corosync “openstack-api-vip” resource on node 3 to avoid the VIP to be declared on both nodes.
  • Re-enable puppet and run puppet agent -t on node 2.
  • Make node 2 rabbitmq-server has joined the new cluster declared on node 1 (do: rabbitmqctl cluster_status) so we have HA for Rabbit again.
  • Move all Neutron virtual routers of node 3 to node 1 or 2.
  • Upgrade node 3 fully, reboot it, and make sure Rabbit is connected to node 1 and 2, as well as corosync working too, then re-apply puppet again.

Note that we do need to re-apply puppet each time, because of some differences between Stretch and Buster. For example, Neutron in Rocky isn’t able to use iptables-nft, and puppet needs to run some update-alternatives command to select iptables-legacy instead (I’m writing this because this isn’t obvious, it’s just that sometimes, Neutron fails to parse the output of iptables-nft…).

Last words as a conclusion

While OpenStack itself has made a lot of progress for the upgrade, it is very disappointing that those components on which OpenStack relies (like corosync, who is typically used as the provider of high availability), aren’t designed with backward compatibility in mind. It is also disappointing that the Erlang versions in Stretch and Buster are incompatible this way.

However, with the correct procedure, it’s still possible to keep services up and running, with a very small down time, even to the point that a public cloud user wouldn’t even notice it.

As the procedure isn’t easy, I strongly suggest anyone attempting such an upgrade to train before proceeding. With OCI, it is easy to do run a PoC using the openstack-cluster-installer-poc package, which is the perfect environment to train on: it’s easy to reproduce, reinstall a cluster and restart the upgrade procedure.

by Goirand Thomas at December 01, 2019 04:45 PM

November 28, 2019

Aptira

Comparison of Software Defined Networking (SDN) Controllers. Part 8: Tungsten Fabric

Aptira Comparison of Software Defined Networking (SDN) Controllers. Tungsten Fabric

The previous Software Defined Networking (SDN) in this series might help users and organisations to choose a right SDN controller for their platform that matches their network infrastructure and requirements. These controllers could be a suitable choice to be used in Communication Service Providers (CSP), data centers, research or suitable choice for integration with other platforms. However, with the current IT market, organisations are moving towards migrating their old infrastructure to the Cloud and cloudifying every part of their infrastructure. As such, we will now look at one of the SDN controllers which has been designed to work in a cloud-grade network – Tungsten Fabric (TF).

TF can be a suitable choice for cloud builders and cloud-native platform engineers. It has been first associated with Juniper but now is under the Linux Foundation umbrella.

Architecture

Tungsten Fabrics architecture is composed of two major software components: TF vRouter and TF Controller.

Aptira Tungsten Fabric Architecture
TF vRouter is used for packet forwarding and applying network and security policies to the devices in the network.

  • VRouters need to be run in each host or compute node in the network. It replaces the Linux bridge and traditional routing stack IP tables, or OpenVSwitch networking on the compute hosts.
  • The TF Controller communicates with the vRouters via Extensible Messaging and Presence Protocol (XMPP) to apply the desired networking and security policies.

TF Controllers consists of following software services:

  • Control and Configuration services for communicating with vRouters and maintaining the network topology and network policies.
  • Analytics services for telemetry and troubleshooting.
  • Web UI services for interacting with users.
  • And finally, services to provide integration with private and public could, CNI plugins, virtual machine and bare metal.

Tungsten Fabric version 5.0 and later architecture use microservices based on Docker containers as shown in figure below to deploy the services mentioned above. This makes the controller resilient against failure and highly available which result in the customer user experience.

Aptira Tungsten Fabric Architecture

Modularity and Extensibility

TF microservice-based architecture allows developing particular services based on the performance requirement and increasing load. Also, microservices by nature are modular which makes the maintenance and extensibility of the platform easy whilst isolating the failure of services from each other.

Scalability

Cluster Scalability

  • TF proceeds towards cluster scalability in a modular fashion. This means each TF role can be scaled horizontally by adding more nodes for that related role. Also, the number of pods for each node is scalable. Zookeeper has been used to choose the active node so the number of pods deployed in the Controller and Analytics nodes must be an odd number according to the nature of the Zookeeper algorithm.

Architectural Scalability

  • TF supports BGP protocol and each TF controller can be connected to other controllers via the BGP protocol. This means TF can be used to connect different SDN islands.

Interfaces

  • Southbound: TF uses the XMPP protocol for communicating with vRouters (data plane) to deliver the overlay SDN solution. BPG also can be used to communicate with legacy devices.
  • Northbound: TF supports Web GUI and RESTful APIs. Plug-ins integrate with other platforms such as orchestrators, clouds and OSS/BSS.

Telemetry

Analytics nodes extract usable telemetry information form infrastructure. The data can then be normalised to the common format and the output is sent via the Kafka service into a Cassandra database. This data can be used in a multitude of ways operationally, from problem solving to capacity planning. Redis uses the data for generating graphs and running queries. The Redis pod is deployed between the analytics pod and the Web UI pod.

Resilience and Fault Tolerance

The modular architecture of Tungsten Fabric makes it resilient against failure, with typically several controllers/pods running on several servers for high availability. Also, the failure of a service is isolated, so it does not affect the whole system. The API and Web GUI services are accessed through a load balancer. The load balancer can allow pods to be in different subnets.

Programming Language

TF supports C++, Python, Go, Node.js.

Community

TF was first associated with Juniper but is now supported under the Linux Foundation Networking umbrella and boasts a large developer and user community.

Conclusion

Given this evaluation; TF is a suitable choice for cloud builders and cloud-native platform engineers. This is because it works flexibly with private and public Clouds, CNI plugins, virtual machines and bare metal. Depending on the orchestrator integrated, it exposes heat APIs, Kubernetes APIs, etc. to instantiate network and security policies. The scalability of TF makes it highly available and resilient against failure which increases the customer user experience. Finally, the modularity features of it allows users to easily customise, read, test and maintain each module separately.

SDN Controller Comparisons:

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Comparison of Software Defined Networking (SDN) Controllers. Part 8: Tungsten Fabric appeared first on Aptira.

by Farzaneh Pakzad at November 28, 2019 12:48 PM

November 26, 2019

OpenStack Superuser

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

Spotlight on the Open Infrastructure Summit Shanghai

Attendees from over 45 countries attended the Open Infrastructure Summit earlier this month that was hosted in Shanghai, followed by the Project Teams Gathering (PTG). Use cases, tutorials, and demos covering 40+ open source projects including Airship, Ceph, Hadoop, Kata Containers, Kubernetes, OpenStack, StarlingX, and Zuul were featured at the Summit.

With the support of the active Open Infrastructure community in China, the market share of OpenStack in the APAC region is expected to increase by 36% in the next four years (451 Research report: OpenStack Market Monitor, 451 Research, September 2019). Currently, China is the second largest market adopting OpenStack software, and it ranks second in the code contribution of the latest version of the OpenStack Train release. Just like what Jonathan Bryce said in the keynotes, “The Summits bring our community members together to meet face to face, advancing the software we build and use daily.”
Check out the highlights of the Open Infrastructure Summit Shanghai:

  • In the Monday morning keynotes, Guohua Xi, the President of the China Communications Standards Association (CCSA), kicked off the event by sharing a call to action for the Chinese community to encourage cross community collaboration to drive innovation. Open Infrastructure users including Baidu, China Mobile, China Telecom, China Unicom, Intel, and Tencent also gave a keynote and shared the key role of the open source projects, such as Kata Containers and OpenStack, in their 5G and container business strategies. Keynote videos are now available here
  • In breakout sessions, Alibaba, Baidu and Tencent presented their Open Infrastructure use cases, highlighting the integration of multiple technologies including Ceph, Kata Containers, Kubernetes, OpenStack, and more. China Railway, China Mobile, Walmart Labs, Line and China UnionPay are among additional Open Infrastructure users who shared their innovations and open source best practices at the Shanghai Summit. Breakout session videos are being added here
  • For its latest release Train, OpenStack received 25,500 code changes by 1,125 developers from 150 different companies. This pace of development makes OpenStack one of the top three most active open source projects in the world alongside Chromium and Linux. 
  • Selected by members of the OSF community, Baidu ABC Cloud Group and Edge Security Team won the Superuser Award for the unique nature of its Kata Containers and OpenStack use case as well as its integration and application of open infrastructure.
  • Combining OpenStack and Kubernetes to address users’ infrastructure needs at scale, Airship joined Kata Containers and Zuul as confirmed Open Infrastructure Projects supported by the OpenStack Foundation. SKT, Intel, Inspur and more companies presented their Airship uses case on developing infrastructure solution.
  • Congratulations to Troila for being elected as a new Gold Member of the OpenStack Foundation! Learn more about it here

Summit keynote videos are already available, and breakout videos will be available on the Open Infrastructure videos page in the upcoming weeks. Thank you to our Shanghai Summit sponsors for supporting the event!

OpenStack Foundation (OSF)

  • The next OSF event will be a collaboration-centric event, happening in Vancouver, Canada June 8-11, 2020. Mark your calendars!
  • Troila was elected as a new Gold Member for the OpenStack Foundation at the Shanghai Board of Directors meeting.

Airship: Elevate your infrastructure

  • Last month, Airship was confirmed by OSF as a top level project — congratulations to the community!
  • The Airship community has made significant progress in Airship 2.0. 17% of planned work was completed, and another 18% is in progress and/or in review. The community is looking for more developers to contribute code. Interested in getting involved? Check out this page.

Kata Containers: The speed of containers, the security of VMs

OpenStack: Open source software for creating private and public clouds

  • Several OpenStack project teams, SIGs and working groups met during the Project Teams Gathering in Shanghai to prepare the Ussuri development cycle. Reports are starting to be posted to the openstack-discuss mailing-list.
  • Sławek Kapłoński, the Neutron PTL, recently reported that neutron-fwaas, neutron-vpnaas, neutron-bagpipe and neutron-bgpvpn are lacking interested maintainers. The Neutron team will drop those modules from future official OpenStack releases if nothing changes by the ussuri-2 milestone, February 14. If you are using those features and would like to step up to help, now is your chance!
  • We are looking for a name for the ‘V’ release of OpenStack, to follow the Ussuri release. Learn more about it in this post by Sean McGinnis
  • The next OpenStack Ops meetup will happen in London, UK on January 7-8. Stay tuned for registration information!

StarlingX: A fully featured cloud for the distributed edge

  • The StarlingX community met during the Project Teams Gathering in Shanghai to discuss topics like 4.0 release planning, documentation and how to improve the contribution process. You can check notes on their etherpad for the event.
  • The upcoming StarlingX 3.0 release will contain the Train version of OpenStack. The community is working on some last bits including testing and bug fixes before the release in December. You can find more information in StoryBoard about the release.

Zuul: Stop merging broken code

  • The Open Infrastructure Summit in Shanghai included a variety of talks, presentations, and discussions about Zuul; a quick project update from lead Zuul maintainer James Blair during keynotes set the tone for the days which followed.

Find the OSF at these upcoming Open Infrastructure community events

Questions / feedback / contribute

This newsletter is written and edited by the OSF staff to highlight open infrastructure communities. We want to hear from you! If you have feedback, news or stories that you want to share, reach us through community@openstack.org . To receive the newsletter, sign up here.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by Allison Price at November 26, 2019 09:11 PM

StackHPC Team Blog

StackHPC at Supercomputing 2019

Stig Telfer presenting at SuperCompCloud

Supercomputing is massive, that much is clear. The same convention centre used for the Open Infrastructure summit earlier in the year was packed to the rafters, and the technical program schedule included significant content addressing the convergence of HPC and Cloud - StackHPC's home territory.

SuperCompCloud Workshop

SuperCompCloud is the Workshop on Interoperability of Supercomputing and Cloud Technologies. Supercomputing 2019 was the first edition of this workshop with a steering committee drawn from CSCS Switzerland, Indiana University, Jülich Supercomputing Centre, Los Alamos National Laboratory, University of Illinois, US Department of Defense and Google.

The program schedule included some very prestigious speakers, and StackHPC was thrilled to be included:

The OpenStack Scientific SIG at Supercomputing

Part of the OpenStack Scientific SIG's remit is to advocate for the use of OpenStack at scientific computing conferences, and a BoF at Supercomputing is an ideal forum for making that case. A panel of regular participants from the Scientific SIG gathered to describe their use cases and discuss the pros and cons of private cloud.

The OpenStack Scientific SIG panel

The SIG BoF panel L-R: Mike Lowe (Indiana University), Bob Budden (NASA GSFC), Blair Bethwaite (NESI), Stig Telfer (StackHPC), Tim Randles (LANL), Martial Michel (Data Machines)

The BoF, titled Cloud and Open Infrastructure Solutions To Run HPC Workloads, was well attended, with a good audience participation around issues such as VM performance for I/O-intensive workloads, and OpenStack's overall health.

Get in touch

If you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.

by Stig Telfer at November 26, 2019 09:00 AM

November 25, 2019

Ghanshyam Mann

Recap of Open Infrastructure Summit & PTG, Shanghai 2019

Open Infrastructure Summit, Shanghai 2019

Open Infrastructure Summit followed by OpenStack PTG was held in Shanghai, China: 4th Nov 2019 till 8th Nov 2019. The first 3 days were for Summit which is market event including Forum sessions and the last 3 days for Project Team Gathering (PTG) with one day overlap.

I arrived in Shanghai on 1st Nov to participate in pre-summit events like Upstream Training and Board of Directors meeting.

    Upstream Institute Training Shanghai:

Like other Summits, Upstream training was held in Shanghai for 1.5 days. The second half on 2nd Nov and a full day on 3rd Nov. Thanks to Lenovo and Jay to sponsor the training this time too.

Etherpad

The first day was 9 mentors and ~20 students. The first day covered the introduction, registration and governance part including VM image setup etc. Students were from different countries, for example South Korea, India and of course China. Two developers from South Korea were interested in Swift contribution. They later joined the Swift PTG and interacted with the team.  One developer from India is doing cloud testing of their baremetal nodes via QA tooling. I had further discussion with him in QA PTG. I am happy to get this kind of interaction in every training and useful to get them onboard in Upstream activities.

The second day was fewer mentors and more students. I and few more mentors could not participate in Training due to the Joint leadership meeting.

    Ussuri cycle community-wide goals discussion:

Three goals were discussed in detail and how to proceed with each of them. Etherpad.

    Drop Python 2.7 Support:

Ussuri is time to drop the python 2 support from OpenStack. Plan and schedule were already discussed during TC office hour and on ML.  This was agreed to make community-wide goal. We discussed keeping the CI/CD support for Swift which is the only project keeping the py2 support. Swift needs the devstack to keep installing on py2 env with the rest of the services on py3 (same as old jobs when Swift was on py2 by default on devstack). There is no oslo dependency from swift and all the other dependency will be capped for py2 version. Requirements check job currently checks that if openstack/requirements list two entries for a requirement. smcginnis patch to change the requirement check is already merged. Everything else will go as discussed in ML. The work on this already started and patches for all the services are up for review now.

    Project Specific New Contributor & PTL Docs

As per feedback in Forum sessions, this is a good goal which will make documentation more consistent. All the projects should edit their contributor.rst to follow a more complete template and adjust/add PTL documentation. This is accepted as a pre-approved as Ussuri goal. Kim Hindhart is working on getting EU funding for people to work on OpenStack and they like consistent documentation.

    Switch remaining legacy jobs to Zuul v3 and drop legacy support

Many projects are still not ready for this goal. Grenade job is not yet on zuulv3 which is required to finish first. Few projects waiting for big projects finishing the zuulv3 migration first. This needs more work and can be a “pre-approved” thing for V, and this would be split to focus on the Grenade work in U. We will continue to review the proposed goal and pre-work etc.

Other than above 3 goals, there were few more ideas for goal candidate and good to go in goal backlogs etherpad:
– cdent: stop using paste, pastedeploy and WSME
Note from Chris: This does not need to be a community goal as such but requires the common solution from TC WSME is still used, has contributions, and at least a core or two

– cmurphy: Consistent and secure default policies. As per the forum discussion this is going with pop-up team first.

– support matrix documentation to be consistent across projects. going with pop-up team (fungi can propose the pop-up team in governance) first Richard Pioso (rpioso) to help fungi on this once consistent framework is identified, the pop-up team can expire with the approval of a related cycle goal for implementing it across remaining projects

    OpenStack QA PTG & Forum Sessions Summary:

I wrote a separate blog to summarize the QA discussions that happened in Forum or PTG.

    Nova API Policies defaults:

Etherpad.

Nova planned to implement the default policy refresh by adopting the system scope and new default roles available in keystone. This was planned for the Train cycle when spec was merged but could not start the implementation. Nova Spec is already merged for Ussuri cycle. The main challenge to do this work is how to complete this in a single cycle so that users upgrade would not impact more than once. We discussed various options like a flag to suppress the deprecation warning or new policy enforcement. Getting all review up and hold the procedural hold on the first patch and later we merge all of them together. Doing the code up after first set merge and more active review will be required for this. Keystone team will help in reviewing the changes. I am very positive to complete this in Ussuri cycle.

    Technical Committee:

Friday was the full day for Technical Committee discussions. It was started with fun when JP collected the number of TC interested per topic and least interested topic to be discussed first :). He did a good job in organizing the discussion with time-based checks.   I am summarizing the few of the Topic below:

    Select U release goals:

This session was to selected the Ussuri goal. Ussuri cycle is already started so we have to finalize the goal asap. We agreed to proceed with below two goals for the Ussuri cyle.

1. Drop Python 2.7 Support – Already Accepted.
2. Project Specific New Contributor & PTL Docs – Under Review

Goal “Switch remaining legacy jobs to Zuul v3 and drop legacy support” will be pre-selected for V cycle, that does not mean to stop the review on the proposed goal or any ongoing work. All ongoing efforts will continue on this.

    Structurize the goal select schedule

After Summit and PTG event is merged we did not do well on community-wide goal selection. It was late in Train cycle also and so is for Ussuri.  Rico and I discussed about having the schedule for goal selection in TC. The main goal is to finalize the cycle community-wide goal before cycle start. This etherpad had the drafted schedule which all TC member agreed on. The first cycle to enforce this schedule will be challenging but let’s see how it will go. I will propose the schedule into TC documents soon.

    Python 2 drop current progress and next steps: 

During PTG, many projects discussed their plan to drop the python2 support. Cinder decided that we are going to just remove py2 compatibility and then do changes to backports in the cases that we need to for py27. Glance has greater challenges around WSGI. Other projects are almost good to go on this. Devstack also defaults to py3 now.

    Release naming convention:

This was again non-fun discussion. Fortunately, we concluded this. The agreement is that any community member can propose the name and TC will finally select the final one.

Few more topics were discussed in PTG which can be found in this Etherpad or ML.

by Ghanshyam Mann at November 25, 2019 05:04 AM

November 24, 2019

Ghanshyam Mann

Open Infrastructure Summit, Shanghai 2019: QA Summit & PTG Summary

OpenStack Quality Assurance Summit & PTG Summary

Open Infrastructure Summit, Shanghai 2019

Open Infrastructure Summit followed by OpenStack PTG was held in Shanghai, China: 4th Nov 2019 till 8th Nov 2019.

The first 3 days were for Summit where we had the forum sessions about user feedback on QA tooling on Monday and the last 3 days for Project Team Gathering (PTG) with one day overlap.

QA Forum sessions

    OpenStack QA – Project Update:  Wednesday, November 6, 10:15am-10:30am

We gave the updates on what we finished on Train and draft plan for the Ussuri cycle.
due to fewer contributors in QA, Train cycle activities are decreased as compare to Stein.  We tried to maintain the daily QA
activity and finished a few important things.

Slides: QA Project Update

    Users / Operators adoption of QA tools / plugins :Mon 41:20pm – 2:00pm

Etherpad. This is another useful session for QA to get feedback as well as information about downstream tooling.

Few tools we talked about:

  • Fault injection tests

One big concern shared from a few people about a long time to get merged tempest patches. One idea to solve this is to bring critical reviews in Office hours.

 

  QA PTG: 6th – 8th Nov:

It was a small gathering this time for one day for PTG on Wednesday. Even with small number of developers, we had good discussions on many topics.  I am summarizing the discussions:

Etherpad.

  Train Retrospective  

Retrospective bought up the few key issues where we need improvement. We collected the below action items including bug triage. Untriage QA bugs are increasing day by day.

  • Action:
    • need to discuss blacklist plugins and how to notify and remove them if dead – gmann
    • start the process of community-goal work in QA – masayuki
    • sprint for bug triage with number of volunteers – 
      • (chandankumar)Include one bug in each sprint in TripleO CI tempest member
      • Traige the new bug and then pick the bug based on priority
      • For tripleo Ci team we will track here: https://tree.taiga.io/project/tripleo-ci-board/ – chandankumar

  How to deal with an aging testing stack. 

With testtools being not so active, we need to think on the alternate or best suitable options to solve this issue. We discussed the few options which need to be discussed further on ML.

  • Can we fork the dependecies of testtools in Temepst or stestr ? 
  • As we are removing the py2.7 support in tempest, we can completly ignore/remove the unittest2 things but that is not case for testtools ?
  • Remove the support of unittest2 from testtools ? py2.7 is going away from everywhere and testools can create tag or something for py2.7 usage ?
  • Since Python2 is going EOL on 01st Jan, 2020, so let’s create a tag and remove the unitest2 with unitest for python3 release only

Action:

  • Document the official supported test runner by Tempest. –  Soniya Vyas/Chandan Kumar
  • ML to discuss the above options – gmann 

  Remove/migrate the .testr.conf to .stestr

60 openstack/* repositories have .stestr.conf AND .testr.conf. We don’t need to have both files at least. Let’s take a look some of them and make a plan to remove if we can.

If both exist then remove the .testr.conf and Then verify that .stestr conf has the correct test path. If only .testr.conf then migrate to stestr.conf

We need to figure out the purpose of pbr .testr.conf code before removing. Is this just old codes or necessary?

  Moving subunit2html script from os-testr

Since os-testr runner piece in os-testr project is deprecated but subunit2html project still exists there, it is widely used across the OpenStack ecosystem, Can we move to somewhere else?  I do not find any benefits to move those scripts to other places. We asked chandan to open an issue on stestr to discuss moving to stestr repo. mtreinish replied on this: os-testr meant to be the place in openstack that we could host the ostestr runner wrapper/script subunit2html, generate_subunit, etc. Just because ostestr is deprecated and being removed doesn’t mean it’s not the proper home for those other tools.

  Separate integrated services tests can be used in TriplO CI

TriplO CI maintains a separate file to run dependent tests per service. Tempest has dependent services tox and integrated jobs and the same can be used in TriplO CI.

For example:

  • tox for networking.

  RBAC testing strategy

This was a cross-project strategy for positive/negative testing for system scope and new defaults in keystone. Keystone has implemented the new defaults and system scope in its policy and added a unit test to cover the new policies.  Nova is implementing the same in Ussuri cycle. As discussed in Denver PTG also, Tempest will implement the new credential for all 9 personas available in keystone.  Slowly migrate the tests start using the new policies. That will be done via a flag switching Tempest to use system scope or new defaults and that flag will be false to keep using the old policies for stable branch testing.

We can use patrole tests or implement new tests in the Tempest plugin and verify the response. Both have the issue of performing the complete operation which is not required always for policy verification.  Running full functional tests is expensive and duplicates existing tests. One solution for that (we talked about it in Denver PTG also) is via some flag like os-profiler by just do the policy check and return the API response with specific return code.

AGREE:

  • Tempest to provide the all 9 personas available from keystone. Slowly migrate Tempest existing tests to run with new policies.
  • We agreed to have two ways to test the policy:
    1. Tempest like tests in tempest plugins with the complete operation and verify the things on response, not just policy return code. It depends on the project if they want to implement such tests.
    2. Unit/Functional tests on the project’s side.
  • Document the both way so that project can adopt the best suitable one.

  How to remove tempest plugin sanity BLACKLIST

We have tempest plugin blacklist. It should be removed in the future if possible. Some of them shouldn’t be as a tempest-plugin because they’re just neutron studium things which already moved to neutron-tempest-plugin but still exiting in repo also. Some of them are less active.  Remove below plugins from BLACKLIST:

  • openstack/networking-generic-switch needs to be checked (setup.py/cfg?)

Action: 

  • Add the start date in blacklist doc so that we can know how long a plugin is blacklisted. 
  • After 60 days: we send the email notification to openstack-discuss, PTL, maitainer and TC to either fix it or remove it from the governance. 

  Python 2.7 drop plan

We discussed the next steps to drop the py2 from Tempest and other QA tools.

AGREE:

  • Will be doing before milestone 2
  • Create a new tag for python-2.7 saying it is the last tag and document that the Tempest tag needs Train u-c. 
  • Test the Tempest tag with Train u-c, if fail then we will disucss. 
  • TripleO and OSA is going to use CentOS 8 for train and master

  Adding New glance tests to Tempest

We discussed on testing the new glance v2 api and feature. Below are the glance features and agreed points on how to test them.

  • Hide old images: Test can be added in Tempest. Hide the image and try to boot the server from the image in scenario tests. 
  • Delete barbican secrets from glance images: This test belongs to barbican-tempest-plugin which can be run as part of the barbican gate using an existing job. Running barbican job on glance gate is not required, we can add a new job (multi stores) on glance gate which can run this + other new features tests. 
  • Multiple stores: DevStack patch is already up, add a new zuul job to set up multiple stores and run on the glance gate with api and scenario tests. gmann to setup the zuulv3 job for that.

  Tempest volunteers for reviewing patches

We’ve noticed that the amount of merged patches in October is less than in September and much less than it was during the summer. This has been brought in feedback sessions also. There is no perfect solution for this. Nowadays QA has less active core developers. We encourage people to bring up the critical or stuck patches in office hours.

  Improving Tempest cleanup

Tempest cleanup is not so stable and not a perfect design. We have spec up to redesign that but could not get a consensus on that. I am ok to move with resource prefix with UUID.  We should extend the cleanup tool for plugins also.

  Ussuri Priority & Planning

This was the last session for the PTG which could not happen on Wed due to strict time-up policy of the conference place which I really liked. Time-based working is much needed for IT people :). We met on Thursday morning in coffee area and discussed about priority for Ussuri cycle. QA Ussuri Priority Etherpad has the priority items with the assignee.

See you in Vancouver!

by Ghanshyam Mann at November 24, 2019 11:51 PM

Open Infrastructure Summit: QA Project Updates, Shanghai 2019

Open Infrastructure Summit, Shanghai 2019

        OpenStack QA – Project Update:  Wednesday, November 6, 10:15am-10:30am

This time no video recording for the Project Updates. Here are the complete slides QA Project Update.

Train cycle Stats:

by Ghanshyam Mann at November 24, 2019 12:31 AM

November 19, 2019

Mirantis

Create and manage an OpenStack-based KaaS child cluster

Deploying and managing Kubernetes clusters doesn't have to be complicated. Here's how to do it with Mirantis Kubernetes as a Service (KaaS).

by Nick Chase at November 19, 2019 05:02 PM

November 18, 2019

StackHPC Team Blog

High Performance Ethernet for HPC – Are we there yet?

Recently there has been a resurgence of interest around the use of Ethernet for HPC workloads, most notably from recent announcements from Cray and Slingshot. In this article I examine some of the history around Ethernet in HPC and look at some of the advantages within modern HPC Clouds.

Of course Ethernet has been the mainstay of many organisations involved in High Throughput Computing large-scale cluster environments (e.g. Geophysics, Particle Physics, etc.) although it does not (generally) hold the mind-share for those organisations where conventional HPC workloads predominate, notwithstanding the fact that for many of these environments, the operational workload for a particular application rarely goes above a small to moderate number of nodes. Here Infiniband has held sway for many years now. A recent look at the TOP500 gives some indication of the spread of Ethernet vs. Infiniband vs. Custom or Proprietary interconnects for both system and performance share, or as I often refer to them as the price-performance and performance, respectively, of the HPC market.

Ethernet share of the TOP500

My interest in Ethernet was piqued some 15-20 years ago as it is a standard, and very early on there were mechanisms to obviate kernel overheads which allowed some level of scalability even back in the days of 1Gbps. This meant even then, that one could exploit Landed-on-Motherboard network technology instead of more expensive PCI add-in cards, Since then as we moved to 10Gbps and beyond, and I coincidentally joined Gnodal (later acquired by Cray), RDMA-enablement (through RoCE and iWarp) allowed standard MPI environment support and with the 25, 50 and 100Gbps implementations, bandwidth and latency promised on par with Infiniband. As a standard we would expect a healthy ecosystem of players within both the smart NIC and switch markets to flourish. For most switches such support is now a standard (see next section). In terms of rNICs Broadcom, Chelsio, Marvel and Mellanox currently offer products supporting either, or both, the RDMA Ethernet protocols.

Pause for Thought (Pun Intended)

I think the answer to the question, on “are we there yet” is, (isn’t it always) going to be “it depends”. That “depends” will largely be influenced by the market segmentation into the Performance, Price-Performance and Price regimes. The question is can Ethernet address the areas of “Price” and “Price-Performance” as opposed to the “Performance Region” where some of the deficiencies of Ethernet RDMA may well be exposed, e.g. multi-switch congestion at large scale but for moderate sized clusters with nodes spanning only a single switch may well be a better fit.

So for example, a cluster of 128 nodes (minus nodes for management, access, storage): if it was possible to assess that 25GbE vs 100Gbps EDR was sufficient, then I can build a system from a single 32-port 100GbE Switch (using break-out cables) as opposed to multiple 36-port EDR switches, which if I take the standard practise of over-subscription, I would end-up with similar cross-sectional bandwidth to the single Ethernet switch anyway. Of course, within the bounds of a single switch the bandwidth would be higher for IB. I guess down the line with 400GbE devices coming to a Data Centre soon, this balance will change.

Recently I had the chance to revisit this when running test benchmarks on a bare-metal OpenStack system being used for prototyping of the SKA (I’ll come on to OpenStack a bit later on but just to remark here that this system runs OpenStack to prototype an operating environment for the Science Data Processing Platform of the SKA).

I wanted to stress-test the networks, compute nodes and to some extent the storage. StackHPC operate the system as a performance prototype platform on behalf of astronomers across the SKA community and so ensuring performance is maintained across the system is critical. The system, eponymously named ALaSKA, looks like this.

ALaSKA - A la SKA

ALaSKA is used to software-define various platforms of interest to various aspects of the SKA-Science Data Processor. The two predominant platforms of interest currently are a Container Orchestration environment (previously Docker-Swarm but now Kubernetes) and a Slurm-as-a-Service HPC platform.

Here we focus on the latter of these, which gives us a good opportunity to look at 100G IB vs 25G RoCE vs 25Gbps TCP vs 10G (network not shown in the above diagram but is used for provisioning) to compare performance. First let us look more closely at the Slurm PaaS. From the base, compute, storage and network infrastructure we use OpenStack Kayobe to deploy the OpenStack control plane (based on Kolla-Ansible) and then marshal the creation of bare-metal compute nodes via the OpenStack Ironic service. The flow looks something like this with the Ansible Control Host being used to configure the OpenStack (via a Bifrost service running on the seed node) as well the configuration of network switches. Github provides the source repositories.

ALaSKA - A la SKA

Further Ansible playbooks together with OpenStack Heat permit the deployment of the Slurm platform, based on the latest OpenHPC image and various high performance storage subsystems, in this case using BeeGFS Ansible playbooks. The graphic above depicts the resulting environment with the addition of OpenStack Monasca Monitoring and Logging Service (depicted by the lizard logo). As we will see later on, this provides valuable insight to system metrics (for both system administrators and the end user).

So let us assume that we first want to address the Price-Performance and Price driven markets - at scale we need to be concerned around East-West traffic congestion between switches, where this can be somewhat mitigated by the fact that with modern 100GbE switches we can break-out to 25/50GbE which increases the arity of a single switch and (likely congestion). Of course, this means we need to be able to justify the reduction in bandwidth of the NIC. Of course if the total system only spans a single switch then congestion may not be an issue, although further work may be required to understand end-point congestion.

To test the systems performance I used (my preference) HPCC and OpenFoam as two benchmark environments. All tests used gcc, MKL and openmpi3 and no attempt was made to further optimise the applications. Afterall, all I want to do is run comparative tests of the same binary, by changing run-time variables to target the underlying fabric. For openmpi, this can be achieved with the following (see below). The system uses an OpenHPC image. At the BIOS level, the system has hyperthreading enabled and so I was careful to ensure that process placement ensured I pinned only half the number of available slots (I’m using Slurm) and mapped by CPU. This is important to know when we come to examine the performance dashboards below. Here are the specific mca parameters for targeting the fabrics.

DEV=" roce ibx eth 10Geth"
for j in $DEV;
do

if [ $j == ibx ]; then
MCA_PARAMS="--bind-to core --mca btl openib,self,vader  --mca btl_openib_if_include mlx5_0:1 "
fi
if [ $j == roce ]; then
MCA_PARAMS="--bind-to core --mca btl openib,self,vader  --mca btl_openib_if_include mlx5_1:1
fi
if [ $j == eth ]; then
MCA_PARAMS="--bind-to core --mca btl tcp,self,vader  --mca btl_tcp_if_include p3p2"
fi
if [ $j == 10Geth ]; then
MCA_PARAMS="--bind-to core --mca btl tcp,self,vader  --mca btl_tcp_if_include em1"
fi
if [ $j == ipoib ]; then
MCA_PARAMS="--bind-to core --mca btl tcp,self,vader  --mca btl_tcp_if_include ib0"
fi

In the results below, I’m comparing the performance across each network using HPCC for a size of 8 nodes (up to 256 cores, albeit 512 virtual cores are available as described above). I think this would cover the vast majority of cases in Research Computing.

Results

HPCC Benchmark

The results for major operations of the HPCC suite are shown below together with a personal narrative of the performance. A more thorough description of the benchmarks can be found here.

8 nodes 256 cores

Benchmark 10GbE (TCP) 25GbE (TCP) 100Gb IB 25GbE RoCE
HPL_Tflops 3.584 4.186 5.476 5.233
PTRANS_GBs 5.656 16.458 44.179 17.803
MPIRandomAccess_GUPs 0.005 0.004 0.348 0.230
StarFFT_Gflops 1.638 1.635 1.636 1.640
SingleFFT_Gflops 2.279 2.232 2.343 2.322
MPIFFT_Gflops 27.961 62.236 117.341 59.523
RandomlyOrderedRingLatency_usec 87.761 100.142 3.054 2.508
RandomlyOrderedRingBandwidth_GBytes 0.027 0.077 0.308 0.092
  • HPL – We can see here that it is evenly balanced between low-latency and b/w with RoCE and IB on a par even with the reduction in b/w of RoCE. In one sense this performance underlies the graphics shown above in terms of HPL, where Ethernet occupies ~50% of the share of total clusters which is not matched in terms of the performance share.
  • PTRANS – Performance pretty much in line with b/w
  • GUPS – latency dominated. IB wins by some margin
  • STARFFT– Embarrassingly Parallel (HTC use-case) no network effect.
  • SINGLEFFT – No effect no comms.
  • MPIFFT – Heavily b/w dominated see effect of 100 vs 25 Gbps (no latency effect)
  • Random Ring Latency – see effect of RDMA vs. TCP. Not sure why RoCE is better that IB, but may be due to the random order?
  • Random Ring B/W – In line with 100Gbps (IB) vs 25Gbps (RDMA) vs TCP networks.

OpenFoam

I took the standard Motorbike benchmark and ran this on 128 (4 nodes) and 256 (8 nodes) cores on the same networks as above. I did not change the mesh sizing between runs and thus on higher processor counts, comms will be imbalanced. The results are shown below, showing very little difference between the RDMA networks despite the bandwidth difference.

Nodes(Processors) 100Gbps IB 25Gbps ROCE 25Gbps TCP 10Gbps TCP
8/(256) 87.64 93.35 560.37 591.23
4/(128) 99.83 101.49 347.19 379.32

Elapsed Time in Seconds. NB the increase in time for TCP when running on more processors!

Future Work

So at present I have only looked at MPI communication. The next big thing to look at is storage, where the advantages of Ethernet need to be assessed not only in terms of performance but also the natural advantage the Ethernet standard has in connectivity for many network-attached devices.

Why OpenStack

As was mentioned above, one of the prototypical aspects of the AlaSKA system is to model operational aspects of the Science Data Processor element of the SKA. A good description of the SDP and the Operational scenarios are described in the architectural description of the system. A description of the architecture and that prototyping can be found here.

Using Ethernet, and in particular the use of High Performance Ethernet (HPC Ethernet in the parlance of Cray), holds a particular benefit in the case of on-premise cloud, as infrastructure may be isolated in terms of multiple tenants. For the particular case of IB and OPA this can be achieved using ad-hoc methods for the respective network. For Ethernet, however, multi-tenancy is native.

For many HPC scenarios, multi tenancy is not important, nor even a requirement. For others, it is key and mandatory, e.g. secure clouds for clinical research. One aspect of multi-tenancy is shown in the analysis of the results, where we use the aspects of OpenStack Monasca (multi-tenant monitoring and logging service) and Grafana dashboards. More information on the architecture of Monasca can be found in a previous blog article.

Appendix – OpenStack Monasca Monitoring O/P

HPCC

The plot below shows CPU usage and network b/w for the runs of HPCC using a grafana dashboard and OpenStack Monasca monitoring as a service. The 4 epochs are shown for the IB, RoCE, 25Gbps (TCP) and 10Gbps (TCP). The total CPU usage is set at 50% as these are HT-enabled nodes but mapped by-core with 1 thread per core. Thus, we are only consuming 50% of the available resources. Network bandwidth is shown for 3 of the epochs shown. “Inbound ROCE Network Traffic”, “Inbound Infiniband Network Traffic” and “Inbound Bulk Data Network Traffic” – Bulk Data Network refers to an erstwhile name for the ingest network for the SDP.

HPCC performance data in Monasca

For the case of CPU usage, a reduction in performance is observed for the TCP cases. This is further evidenced by a 2nd plot that shows the system CPU, showing heavy system overhead for the 4 separate epochs.

HPCC CPU performance data in Monasca

by John Taylor at November 18, 2019 02:00 AM

November 17, 2019

Christopher Smart

Use swap on NVMe to run more dev KVM guests, for when you run out of RAM

I often spin up a bunch of VMs for different reasons when doing dev work and unfortunately, as awesome as my little mini-itx Ryzen 9 dev box is, it only has 32GB RAM. Kernel Samepage Merging (KSM) definitely helps, however when I have half a dozens or so VMs running and chewing up RAM, the Kernel’s Out Of Memory (OOM) killer will start executing them, like this.

[171242.719512] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/machine.slice/machine-qemu\x2d435\x2dtest\x2dvm\x2dcentos\x2d7\x2d00.scope,task=qemu-system-x86,pid=2785515,uid=107
[171242.719536] Out of memory: Killed process 2785515 (qemu-system-x86) total-vm:22450012kB, anon-rss:5177368kB, file-rss:0kB, shmem-rss:0kB
[171242.887700] oom_reaper: reaped process 2785515 (qemu-system-x86), now anon-rss:0kB, file-rss:68kB, shmem-rss:0kB

If I had more slots available (which I don’t) I could add more RAM, but that’s actually pretty expensive, plus I really like the little form factor. So, given it’s just dev work, a relatively cheap alternative is to buy an NVMe drive and add a swap file to it (or dedicate the whole drive). This is what I’ve done on my little dev box (actually I bought it with an NVMe drive so adding the swapfile came for free).

Of course the number of VMs you can run depends on the amount of RAM each VM actually needs for what you’re running on it. But whether I’m running 100 small VMs or 10 large ones, it doesn’t matter.

To demonstrate this, I spin up a bunch of CentOS 7 VMs at the same time and upgrade all packages. Without swap I could comfortably run half a dozen VMs, but more than that and they would start getting killed. With 100GB swap file I am able to get about 40 going!

Even with pages swapping in and out, I haven’t really noticed any performance decrease and there is negligible CPU time wasted waiting on disk I/O when using the machines normally.

The main advantage for me is that I can keep lots of VMs around (or spin up dozens) in order to test things, without having to juggle active VMs or hoping they won’t actually use their memory and have the kernel start killing my VMs. It’s not as seamless as extra RAM would be, but that’s expensive and I don’t have the slots for it anyway, so this seems like a good compromise.

by Chris at November 17, 2019 07:26 AM

November 15, 2019

Slawek Kaplonski

My Summary of OpenStack PTG in Shanghai

This is my summary of OpenStack PTG which had place in Shanghai in November 2019. It is brief summary of all discussions which we had in Neutron room during 3 days event. On boarding Slides from onboarding session can be found at here In my opinion onboarding was good. There was around 20 (or even more) people in the room during this session. Together with Miguel Lavalle we gave talk about

November 15, 2019 12:56 PM

StackHPC Team Blog

Shanghai Open Infrastructure Summit: OpenStack Goes East

Shanghai was anticipated to be an impressive backdrop to the latest Open Infrastructure summit, and the mega-city (and the fabulous conference centre) did not disappoint.

The dual-language summit worked out well enough. Parallel tracks in different languages enabled every attendee to get something from the conference. The fishbowl sessions were mostly conducted in English - occasionally bilingual speakers would flip between languages, to foster inclusivity.

The Open Infra Shanghai Skyline

The Future of OpenStack

In line with recent trends, the Shanghai Summit was smaller and subject to less attention from vendors. However, analysis of the scale of adoption of OpenStack and related open infrastructure technologies showed that the market continues to grow, and is predicted to carry on doing so.

OpenStack market projections 451 Research

This confirmed StackHPC's own view, that the OpenStack market, now shorn of much of its initial hype, is maturing alongside the project to become the de facto workhorse of private cloud infrastructure.

The Scientific SIG

Unfortunately the location and scheduling did not work out well for the Scientific SIG. Usually, the SIG events at open infrastructure summits draw a hundred or more attendees for discussion that could run far beyond the alloted time. This time, through budgetary pressures, issues with clearance for travel, placement in the developer schedule after the main summit, or various other reasons, attendance fell well short. Nevertheless, it was a pleasure to meet and chat with the people who made it.

Now THAT's a Train!

Now THAT's a Train!

All in all, a great summit. StackHPC is already looking forward to Vancouver in June 2020!

Get in touch

If you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.

by Stig Telfer at November 15, 2019 09:00 AM

November 14, 2019

Nate Johnston

Shanghai PTG Summary - Remote

I attended the Neutron meetings for the OpenInfra PTG in Shanghai last week. I was not in Shanghai, so I participated entirely remotely over BlueJeans. Remote Participation Typically I would work most of a day - 5-6 hours with a nap in the middle - and then be on the PTG from 3-5 hours in the evening. The timeshift was such that the scheduled block of meetings started at 8:00pm my time and ended at 3:30am.

November 14, 2019 07:55 PM

Ben Nemec

Oslo in Shanghai

Despite my trepidation about the trip (some of it well-founded!), I made it to Shanghai and back for the Open Infrastructure Summit and Project Teams Gathering. I even managed to get some work done while I was there. :-)

First, I recommend reading the opening of Colleen Murphy's blog post about the event (and the rest of it too, if you have any interest in what Keystone is up to). It does an excellent job of describing the week at a high level. To summarize in my own words, the energy of this event was a little off. Many regular contributors were not present because of the travel situation and there was less engagement from local contributors than I would have hoped for. However, that doesn't mean nothing good came out of it!

In fact, it was a surprisingly active week for Oslo, especially given that only myself and two other cores were there and we had limited discussion within the team. It turns out Oslo was a popular topic of conversation in various Forum sessions, particularly oslo.messaging. This led to some good conversation at the PTG and a proposal for a new Oslo library. Not only were both Oslo summit sessions well attended, but good questions were asked in both so people weren't just there waiting for the next talk. ;-) In fact, I went 10 minutes over time on the project update (oops!), in part because I hadn't really planned time for questions since I've never gotten any in the past. Not complaining though.

Read on for more detail about all of this.

oslo.messaging drivers

It should come as no surprise to anyone that one of major pain points for OpenStack operators is RabbitMQ administration. Rabbit is a frequent bottleneck that limits the scale of deployed clouds. While it should be noted that this is not always Rabbit's fault, scaling of the message queue is a problem almost everyone runs into at some point when deploying large clouds. If you don't believe me, ask someone how many people attended the How we used RabbitMQ in wrong way at a scale presentation during the summit (which I will talk more about in a bit). The room was packed. This is definitely a topic of interest to the OpenStack community.

A few different solutions to this problem have been suggested. First, I'll talk about a couple of new drivers that have been proposed.

NATS

This was actually submitted to oslo.messaging even before the summit started. It's a new driver that uses the NATS messaging system. NATS makes some very impressive performance claims on its site, notably that it has around an order of magnitude higher throughput than RabbitMQ. Anybody interested in being able to scale their cloud 10x just by switching their messaging driver? I thought so. :-)

Now, this is still in the early discussion phase and there are some outstanding questions surrounding it. For one, the primary Python driver is not compatible with Eventlet (sigh...) which makes it unusable for oslo.messaging. There does exist a driver that would work, but it doesn't seem to be very maintained and as a result we would likely be taking on not just a new oslo.messaging driver but also a new NATS library if we proceed with this. Given the issues we've had in the past with drivers becoming unmaintained and bitrotting, this is a non-trivial concern. We're hoping to work with the driver proposers to make sure that there will be sufficient staffing to maintain this driver in the long run. If you are interested in helping out with this work please contact us ASAP. Currently it is being driven by a single contributor, which is likely not sustainable.

We will also need to ensure that NATS can handle all of the messaging patterns that OpenStack uses. One of the issues with previous high performance drivers such as ZeroMQ or Kafka was that while they were great at some things, they were missing important functionality for oslo.messaging. As a result, that functionality either had to be bolted on (which reduces the performance benefits and increases the maintenance burden) or the driver had to be defined as notification-only, in which case operators end up having to deploy multiple messaging systems to provide both RPC and notifications. Even if the benefits are worth it, it's a hard sell to convince operators to deploy yet another messaging service when they're already struggling with the one they have. Fortunately, according to the spec the NATS driver is intended to be used for both so hopefully this won't be an issue.

gRPC

In one of the sessions, I believe "Bring your crazy idea", a suggestion was made to add a gRPC driver to oslo.messaging as well. Unfortunately, I think this is problematic because gRPC is also not compatible with Eventlet, and I'm not sure there's any way to make it work. It's also not clear to me that we need multiple alternatives to RabbitMQ. As I mentioned above, we've had problems in the past with alternative drivers not being maintained, and the more drivers we add the more maintenance burden we take on. Given that the oslo.messaging team is likely shrinking over the next cycle, I don't know that we have the bandwidth to take on yet another driver.

Obviously if someone can do a PoC of a gRPC driver and show that it has significant benefits over the other available drivers then we could revisit this, but until that happens I consider this a non-starter.

Out-of-tree Drivers

One interesting suggestion that someone made was to implement some of these proposed drivers outside of oslo.messaging. I believe this should be possible with no changes to oslo.messaging because it already makes use of generic entry points for defining drivers. This could be a good option for incubating new drivers or even as a longer term solution for drivers that don't have enough maintainers to be included in oslo.messaging itself. We'll need to keep this option in mind as we discuss the new driver proposals.

Reduce the amount of RPC in OpenStack

This also came out of the crazy idea session, but I don't recall that there was much in the way of specifics (I was distracted chatting with tech support in a failed attempt to get my cell phone working during this session). In general, reducing the load on the messaging layer would be a good thing though. If anyone has suggestions on ways to do this please propose them on the openstack-discuss mailing list.

LINE

Now we get to some very concrete solutions to messaging scaling that have already been implemented. LINE gave the RabbitMQ talk I mentioned earlier and had some novel approaches to the scaling problems they encountered. I suggest watching the recording of their session when it is available because there was a lot of interesting stuff in it. For this post, I'm going to focus on some of the changes they made to oslo.messaging in their deployment that we're hoping to get integrated into upstream.

Separate Notification Targets

One important architecture decision that LINE made was to use a separate RabbitMQ cluster for each service. This obviously reduces the load on an individual cluster significantly, but it isn't necessarily the design that oslo.messaging assumes. As a result, we have only one configuration section for notifications, but in a split architecture such as LINE is using you may want service-specific notifications to go to the service-specific Rabbit cluster. The spec linked in the title for this section was proposed to provide that functionality. Please leave feedback on it if this is of interest to you.

oslo.messaging instrumentation and oslo.metrics

One of the ways LINE determined where their messaging bottlenecks were was some instrumentation that they added to oslo.messaging to provide message-level metrics. This allowed them to get very granular data about what messages were causing the most congestion on the messaging bus. In order to collect these metrics, they created a new library that they called oslo.metrics. In essence, the oslo.messaging instrumentation calls oslo.metrics when it wants to output a metric, oslo.metrics then takes that data, converts it to a format Prometheus can understand, and serves it on an HTTP endpoint that the oslo.metrics library creates. This allowed them to connect the oslo.messaging instrumentation to their existing telemetry infrastructure.

Interestingly, this concept came up in other discussions throughout the week as well, so we're hoping that we can get oslo.metrics upstreamed (currently it is something they implemented downstream that is specific to their deployment) and used in more places. Another interesting related possibility was to add a new middleware to oslo.middleware that could do a similar thing for the API services and potentially provide useful performance metrics from them.

We had an extended discussion with the LINE team about this at the Oslo PTG table, and the next steps will be for them to fill out a spec for the new library and hopefully make their code changes available for review. Once that is done, we had commitments from a number of TC members to review and help shepherd this work along. All in all, this seems to be an area of great interest to the community and it will be exciting to see where it goes!

Policy Improvements

I'm going to once again refer you to Colleen's post, specifically the "Next Steps for Policy in OpenStack" section since this is being driven more by Keystone than Oslo. However, one interesting thing that was discussed with the Nova team that may affect Oslo was how to manage these changes if they end up taking more than one cycle. Because the oslo.policy deprecation mechanism is used to migrate services to the new-style policy rules, operators will start seeing quite a few deprecation messages in their logs once this work starts. If it takes more than one cycle then that means they may be seeing deprecations for multiple cycles, which is not ideal.

Currently Nova's plan is to queue up all of their policy changes in one big patch series of doom and once they are all done merge the whole thing at once. It remains to be seen how manageable such a patch series that touches code across the project will be though. If it proves untenable, we may need to implement some sort of switch in oslo.policy that would allow deprecations to be temporarily disabled while this work is ongoing, and then when all of the policy changes have been made the switch could be flipped so all of the deprecations take effect at once. As of now I have no plans to implement such a feature, but it's something to keep in mind as the other service projects get serious about doing their policy migrations.

oslo.limit

The news is somewhat mixed on this front. Unfortunately, the people (including me) who have been most involved in this work from the Keystone and Oslo sides are unlikely to be able to drive it to completion due to changing priorities. However, there is still interest from the Nova side, and I heard rumors at the PTG that there may be enough operator interest in the common quota work that they would be able to have someone help out too. It would be great if this is still able to be completed as it would be a shame to waste all of the design work and implementation of unified limits that has already been done. The majority of the initial API is available for review and just needs some massaging to be ready to merge. Once that happens, projects can start consuming it and provide feedback on whether it meets their needs.

Demo of Oslo Tools That Make Life Easier for Operators

A bit of shameless self-promotion, but this is a presentation I did in Shanghai. The recording isn't available yet, but I'll link it once it is. In essence, this was my attempt to evangelize some Oslo tools that have been added somewhat recently but people may not have been aware of. It covers what the tools are good for and how to actually use them.

Conclusion

As I tweeted on the last day of the PTG, this was a hard event for me to leave. Changes in my job responsibilities mean this was likely my last summit and my last opportunity to meet with the OpenStack family face-to-face. Overall it was a great week, albeit with some rough edges, which is a double-edged sword. If the week had gone terribly maybe I wouldn't have been so sad to leave, but on the other hand it was nice to go out on a high note.

If you made it this far, thanks! Please don't hesitate to contact me with any comments or questions.

by bnemec at November 14, 2019 06:35 PM

SUSE Conversations

The Brains Behind the Books – Part VII: Alexandra Settle

The content of this article has been contributed by Alexandra Settle, Technical Writer at the SUSE Documentation Team. It is part of a series of articles focusing on SUSE Documentation and the great minds that create the manuals, guides, quick starts, and many more helpful documents.       A Dream of  Ice Cream Shops and Lego […]

The post The Brains Behind the Books – Part VII: Alexandra Settle appeared first on SUSE Communities.

by chabowski at November 14, 2019 12:23 PM

November 13, 2019

Colleen Murphy

Shanghai Open Infrastructure Forum and PTG

The Open Infrastructure Summit, Forum, and Project Teams Gathering was held last week in the beautiful city of Shanghai. The event was held in the spirit of cross-cultural collaboration and attendees arrived with the intention of bridging the gap with a usually faraway but significant part of the OpenStack community …

by Colleen Murphy at November 13, 2019 01:00 AM

Sean McGinnis

November 2019 OpenStack Board Notes

The Open Infrastructure Summit was held in mainland China for the first time the week of November 4th, 2019, in Shanghai. As usual, we took advantage of the opportunity of having so many members in one place by having a Board of Directors meeting on Sunday, November 3.

Attendance was a little lighter due to visa challenges, travel budgets, and other issues. But we still had a quorum with a lot of folks in the room, and I’m sure it was a nice change for our Chinese board members and others from the APAC region.

The original meeting agenda is published on the wiki as usual.

OSF Updates

Following the usual pattern, Jonathan Bryce kicked things off with an update of Foundation and project activity.

One interesting thing that really stood out to me, which Jonathan also shared the next day in the opening keynotes, as an analyst report putting OpenStack’s market at $7.7 billion in 2020. I am waiting for those slides to be published, but I think this really showed that despite the decrease in investment by companies in the development of OpenStack, its adoption and growth is stable and growing.

This was especially highlighted in China, with companies like China UnionPay, China Mobile, and other large companies from other industries increasing their use of OpenStack. And public clouds like Huawei and other local service providers basing their services on top of OpenStack.

I can definitely state from experience after that week, access to the typical big 3 public cloud providers in the US is a challenge through the Great Firewall. Being able to base your services on top of a borderless open source option like OpenStack is a great option with the current political pressures. A community-based solution, rather than a foreign tech company’s offerings, probably makes a lot of sense and is helping drive this adoption.

Of course, telecom adoption is still growing as well. I’m not as involved in that space, but it really seems like OpenStack is becoming the de facto standard for having a programmable infrastructure to base dynamic NFV solutions on top of, but directly with VMs and baremetal, and as a locally controlled platform to serve as the underlying infrastructure for Kubernetes.

Updates and Community Reports

StarlingX Progress Report

The StarlingX project has made a lot of progress over the last several months. They are getting closer and closer to the latest OpenStack code. They have been actively working on getting their custom changes merged upstream so they do not need to continue maintaining a fork. So far, they have been able to get a lot of changes in to various projects. They hope to eventually be able to just deploy standard OpenStack services configured to meet their needs, focusing instead on the services on top of OpenStack that make StarlingX attractive and a great solution for edge infrastructure.

Indian Community Update

Prakash Ramchandran gave an update on the various meetups and events being organized across India. This is a large market for OpenStack. Recently approved government initiatives could make this an ideal time to help nurture the Indian OpenStack community.

I’m glad to see all of the activity that Prakash has been helping support there. This is another region where I expect to see a lot of growth in OpenStack adoption.

Interop Working Group

Egle gave an update of the Interop WG activity and the second 2019 set of changes were approved. Nothing too exciting there, with just minor updates to the interop requirements.

The larger discussion was about the need and the health of the Interop WG. Chris Hoge was a very active contributor to this, but he recently left the OSF, and the OpenStack community, to pursue a different opportunty. Egle Sigler is really the only one left on the team, and she has shared that she would not be able to do much more with the group other than keeping the lights on.

This team is responsible for the guidelines that must be followed for someone to certify their service or distribution of OpenStack meets the minimum functionality requirements to be consistent with other OpenStack deployments. This is certification is needed to be able to use the OpenStack logo and be called “OpenStack Powered”.

I think there was pretty unanimous agreement that this kind of thing is still very important. Users need to be able to have a consistent user experience when moving between OpenStack-based clouds. Inconsistency would lead to unexpected behaviors or responses and a poor user experience.

For now it is a call for help and to raise awareness. It did make me think about how we’ve been able to decentralize some efforts within the community, like moving documentation into each teams repos rather than having a centralized docs team and docs repo. I wonder if we can put some of this work on the teams themselves to mark certain API calls as “core”, then some testing in place to ensure none of these set APIs are changed or start producing different results. Something to think about at least.

First Contact SIG Update

The First Contact SIG works on things to make getting involved in the community easier. They’ve done a lot of work in the past on training and contributor documentation. They’ve recently added a Contributing Organization Guide that is targeted at the organization management level to help them understand how they can make an impact and help their employees to be involved and productive.

That’s an issue we’ve had to varying degrees in the past. Companies have had good intentions of getting involved, but they are not always sure where to start. Or they task a few employees to contribute without a good plan on how or where to do so. I think it will be good having a place to direct these companies to, to help them understand how to work with OpenStack and an open source community.

Troila Gold Member Application

Troila is an IT services company in China that provides a cloud product based on OpenStack to their customers. They have been using OpenStack for some time and saw the value in becoming an OSF Gold level sponsor.

As part of the Member Committee, Rob Esker and I met with them the week prior to go over their application and answer any questions and give feedback. That preview was pretty good, and Rob and I only had minor suggestions for them to help highlight what they have been doing with OpenStack and what their future plans were.

They had taken these suggestions and made updates to their presentation, and I think they did a very nice job explaining their goals. There was some discussion and additional questions from the board, but after a quick executive session, we voted and approved Troila as the latest Gold member of the OpenStack Foundation.

Combined Leadership Meeting

The second half of the day was a joint session with the Board and the Technical Committees or Technical Steering Committees of the OpenStack, StarlingX, Airship, Kata, and Zuul projects. Each team gave a community update for their respective areas.

My biggest takeaway from this was that although we are unresources in some areas, we really do have a large and very active community of people that really care about the things they are working on. Seeing growing adoption for things like Kata Containers and Zuul is really exciting.

Next Meeting

The next meeting will be a conference call on December 10th. No word yet on the agenda for that, but I wouldn’t expect too much being so soon after Shanghai. I expect their will probably be some buzz about the annual elections coming up.

Once available, the agenda will be published to the usual spot.

I have the issue that I have been able to finish out my term because the rest of the board voted to allow me to do so as an exception to the two seat per company limit since I had rejoined Dell half way through the year. That won’t apply for the next election, so if the three of us from Dell all hope to continue, one of us isn’t going to be able to.

I’ve waffled on this a little, but at least right now, I do think I am going to run for election again. Prakash has been doing some great work with his participation in the India OpenStack community, so I will not feel too bad if I lose out to him. I do think I’ve been more integrated in the overall development community, so since an Individual Director is supposed to be a representative for the community, I do hope I can continue. That will be up to the broader community, so I am not going to worry about it. The community will be able to elect those they support, so no matter what it will be good.

by Sean McGinnis at November 13, 2019 12:00 AM

November 12, 2019

Sean McGinnis

Why is the Cinder mascot a horse?!

I have to admit, I have to laugh to myself every time I see the Cinder mascot in a keynote presentation.

Cinder horse mascot

History (or, why the hell is that the Cinder mascot!)

The reason at least a few of us find it so funny is that it’s a bit of an inside joke.

Way back in the early days of Cinder, someone from Solidfire came up with a great looking cinder block logo for the project. It was along the style if the OpenStack logo at the time and was nice and recognizable.

Cinder logo

Then around 2016, they decided it was time to refresh the OpenStack logo and make it look more modern and flat. Our old logo no longer matched the overall project, but we still loved it.

I did make an attempt to update it. I made a stylized version of the Cinder block logo using the new OpenStack logo as a basis for it. I really wish I could find it now, but I may have lost the image when I switched jobs. You may still see it on someone’s laptop - I had a very small batch of stickers made while I was still Cinder PTL.

It was soon after the OpenStack logo change that the Foundation decided to introduce mascots for each project. They were asking for each team to thing of an animal that they could identify with. It was supposed to be a fun exercise for the teams to be able to pick their own kind of logo, with graphic designers coming up with very high quality images.

The Cinder team didn’t really have an obvious animal. At least not as obvious as a Cinder block had been. It was during one of our midcycle meetups in Ft. Collins, Co while we were brainstorming that led to our horse.

Trying to think of something that would actually represent the team, we were talking over what Cinder actually was. We were mostly all from different storage vendors. We refer to the different storage devices that are used with Cinder as backends.

Backends are also what some call butts. Butts… asses. Donkeys are also called asses. Donkey!

One or two people on the team had cultural objects to having a donkey as a mascot. They didn’t think it was a good representation of our project. So we compromised with going with a horse.

So we asked for a horse to be our mascot. The initial design they came up with was a Ferrari looking stallion. Way to sporty and fierce for our team. Even though the OpenStack Foundation has actually published it and even created some stickers, we explained our, erm… thought process… behind coming up with the horse in the first place. The design team was great, and went back to the drawing board. The result is the back-end view of the horse that we have today. They even worked a little ‘C’ into the swish of the horse’s tail.

So that’s the story behind the Cinder logo. It’s just because we’re all a bunch of backends.

by Sean McGinnis at November 12, 2019 12:00 AM

November 11, 2019

RDO

Community Blog Round Up 11 November 2019

As we dive into the Ussuri development cycle, I’m sad to report that there’s not a lot of writing happening upstream.

If you’re one of those people waiting for a call to action, THIS IS IT! We want to hear about your story, your problem, your accomplishment, your analogy, your fight, your win, your loss – all of it.

And, in the meantime, Adam Young says it’s not that cloud is difficult, it’s networking! Fierce words, Adam. And a super fierce article to boot.

Deleting Trunks in OpenStack before Deleting Ports by Adam Young

Cloud is easy. It is networking that is hard.

Read more at https://adam.younglogic.com/2019/11/deleting-trunks-before-ports/

by Rain Leander at November 11, 2019 01:46 PM

November 09, 2019

Aptira

OSN-Day

Aptira OSN Day

The Open Networking technology landscape has evolved quickly over the last two years. How can Telco’s keep up?

Our team of Network experts have used Software Defined Networking techniques for many different use cases, including: Traffic EngineeringSegment RoutingIntegration and Automated Traffic Engineering and many more, addressing many of the key challenges associated with networks; including security, volume and flexibility concerns to provide customers with an uninterrupted user experience.

At OSN Day, we will be helping attendees to learn about the risks associated with 5G networks. Edge Compute is needed for 5G and 5G-enabled use cases, but currently 5G-enabled use cases are ill-defined and incremental revenue is uncertain. Therefore, it’s not clear what is actually required, and the Edge business case is risky. We’ll be on site explaining how to mitigate against these risks, ensuring successful network functionality through the implementation of a risk-optimised approach to 5G. You can download the full whitepaper here.

We will also have our amazingly talented Network Consultant Farzaneh Pakzad presenting in The Programmable Network breakout track. Farzaneh will be comparing, rating and evaluating each of the most popular Open Source SDN controllers in use todayThis comparison will be useful for organisations to help them select the right SDN controller for their platform which match their network design and requirements. 

Farzaneh has a PhD in Software Defined Networks from the University of Queensland. Her research interests include Software Defined Networks, Cloud Computing and Network Security. During her career, Farzaneh has provided advisory service for transport SDN solutions and implemented Software Defined Networking Wide Area Network functionalities for some of Australia’s largest Telco’s.

We’ve got some great swag to giveaway and will also be running a demonstration on Tungsten Fabric as a Kubernetes CNI, so if you’re at OSN Day make sure you check out Farzaneh’s session in Breakout room 2 and also visit the team of Aptira Solutionauts in the expo room. They can help you to create, design and deploy the network of tomorrow.

Ready to move your network into the software defined future?
Automate your network with ONAP.

Find Out How

The post OSN-Day appeared first on Aptira.

by Jessica Field at November 09, 2019 12:53 PM

November 07, 2019

Adam Young

Deleting Trunks in OpenStack before Deleting Ports

Cloud is easy. It is networking that is hard.

Red Hat supports installing OpenShift on OpenStack. As a Cloud SA, I need to be able to demonstrate this, and make it work for customers. As I was playing around with it, I found I could not tear down clusters due to a dependency issue with ports.


When building and tearing down network structures with Ansible, I had learned the hard way that there were dependencies. Routers came down before subnets, and so one. But the latest round had me scratching my head. I could not get ports to delete, and the error message was not a help.

I was able to figure out that the ports linked to security groups. In fact, I could unset almost all of the dependencies using the port set command line. For example:

openstack port set openshift-q5nqj-master-port-1  --no-security-group --no-allowed-address --no-tag --no-fixed-ip

However, I still could not delete the ports. I did notice that there was a trunk_+details section at the bottom of the port show output:

trunk_details         | {'trunk_id': 'dd1609af-4a90-4a9e-9ea4-5f89c63fb9ce', 'sub_ports': []} 

But there is no way to “unset” that. It turns out I had it backwards. You need to delete the port first. A message from Kristi Nikolla:

the port is set as the parent for a “trunk” so you need to delete the trunk firs

Kristi In IRC
<pre lang="bash">curl -H "x-auth-token: $TOKEN" https://kaizen.massopen.cloud:13696/v2.0/trunks/</pre>

It turns out that you can do this with the CLI…at least I could.

$ openstack network trunk show 01a19e41-49c6-467c-a726-404ffedccfbb
FieldValue
admin_state_up UP
created_at 2019-11-04T02:58:08Z
description
id 01a19e41-49c6-467c-a726-404ffedccfbb
name openshift-zq7wj-master-trunk-1
port_id 6f4d1ecc-934b-4d29-9fdd-077ffd48b7d8
project_id b9f1401936314975974153d78b78b933
revision_number 3
status DOWN
sub_ports
tags [‘openshiftClusterID=openshift-zq7wj’]
tenant_id b9f1401936314975974153d78b78b933
updated_at 2019-11-04T03:09:49Z

Here is the script I used to delete them. Notice that the status was DOWN for all of the ports I wanted gone.

for PORT in $( openstack port list | awk '/DOWN/ {print $2}' ); do TRUNK_ID=$( openstack port show $PORT -f json | jq  -r '.trunk_details | .trunk_id ') ; echo port  $PORT has trunk $TRUNK_ID;  openstack network trunk delete $TRUNK_ID ; done

Kristi had used the curl command because he did not have the network trunk option in his CLI. Turns out he needed to install python-neutronclient first.

by Adam Young at November 07, 2019 07:27 PM

November 06, 2019

StackHPC Team Blog

Worlds Collide: Virtual Machines & Bare Metal in OpenStack

Ironic's mascot, Pixie Boots

To virtualise or not to virtualise?

If performance is what you need, then there's no debate - bare metal still beats virtual machines; particularly for I/O intensive applications. However, unless you can guarantee to keep it fully utilised, iron comes at a price. In this article we describe how Nova can be used to provide access to both hypervisors and bare metal compute nodes in a unified manner.

Scheduling

When support for bare metal compute via Ironic was first introduced to Nova, it could not easily coexist with traditional hypervisor-based workloads. Reported workarounds typically involved the use of host aggregates and flavor properties.

Scheduling of bare metal is covered in detail in our bespoke bare metal blog article (see Recap: Scheduling in Nova).

Since the Placement service was introduced, scheduling has significantly changed for bare metal. The standard vCPU, memory and disk resources were replaced with a single unit of a custom resource class for each Ironic node. There are two key side-effects of this:

  • a bare metal node is either entirely allocated or not at all
  • the resource classes used by virtual machines and bare metal are disjoint, so we could not end up with a VM flavor being scheduled to a bare metal node

A flavor for a 'tiny' VM might look like this:

openstack flavor show vm-tiny -f json -c name -c vcpus -c ram -c disk -c properties
{
  "name": "vm-tiny",
  "vcpus": 1,
  "ram": 1024,
  "disk": 1,
  "properties": ""
}

A bare metal flavor for 'gold' nodes could look like this:

openstack flavor show bare-metal-gold -f json -c name -c vcpus -c ram -c disk -c properties
{
  "name": "bare-metal-gold",
  "vcpus": 64,
  "ram": 131072,
  "disk": 371,
  "properties": "resources:CUSTOM_GOLD='1',
                 resources:DISK_GB='0',
                 resources:MEMORY_MB='0',
                 resources:VCPU='0'"
}

Note that the vCPU/RAM/disk resources are informational only, and are zeroed out via properties for scheduling purposes. We will discuss this further later on.

With flavors in place, users choosing between VMs and bare metal is handled by picking the correct flavor.

What about networking?

In our mixed environment, we might want our VMs and bare metal instances to be able to communicate with each other, or we might want them to be isolated from each other. Both models are possible, and work in the same way as a typical cloud - Neutron networks are isolated from each other until connected via a Neutron router.

Bare metal compute nodes typically use VLAN or flat networking, although with the right combination of network hardware and Neutron plugins other models may be possible. With VLAN networking, assuming that hypervisors are connected to the same physical network as bare metal compute nodes, then attaching a VM to the same VLAN as a bare metal compute instance will provide L2 connectivity between them. Alternatively, it should be possible to use a Neutron router to join up bare metal instances on a VLAN with VMs on another network e.g. VXLAN.

What does this look like in practice? We need a combination of Neutron plugins/drivers that support both VM and bare metal networking. To connect bare metal servers to tenant networks, it is necessary for Neutron to configure physical network devices. We typically use the networking-generic-switch ML2 mechanism driver for this, although the networking-ansible driver is emerging as a promising vendor-neutral alternative. These drivers support bare metal ports, that is Neutron ports with a VNIC_TYPE of baremetal. Vendor-specific drivers are also available, and may support both VMs and bare metal.

Where's the catch?

One issue that more mature clouds may encounter is around the transition from scheduling based on standard resource classes (vCPU, RAM, disk), to scheduling based on custom resource classes. If old bare metal instances exist that were created in the Rocky release or earlier, they may have standard resource class inventory in Placement, in addition to their custom resource class. For example, here is the inventory reported to Placement for such a node:

$ openstack resource provider inventory list <node UUID>
+----------------+------------------+----------+----------+-----------+----------+--------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit |  total |
+----------------+------------------+----------+----------+-----------+----------+--------+
| VCPU           |              1.0 |       64 |        0 |         1 |        1 |     64 |
| MEMORY_MB      |              1.0 |   131072 |        0 |         1 |        1 | 131072 |
| DISK_GB        |              1.0 |      371 |        0 |         1 |        1 |    371 |
| CUSTOM_GOLD    |              1.0 |        1 |        0 |         1 |        1 |      1 |
+----------------+------------------+----------+----------+-----------+----------+--------+

If this node is allocated to an instance whose flavor requested (or did not explicitly zero out) standard resource classes, we will have a usage like this:

$ openstack resource provider usage show <node UUID>
+----------------+--------+
| resource_class |  usage |
+----------------+--------+
| VCPU           |     64 |
| MEMORY_MB      | 131072 |
| DISK_GB        |    371 |
| CUSTOM_GOLD    |      1 |
+----------------+--------+

If this instance is deleted, the standard resource class inventory will become available, and may be selected by the scheduler for a VM. This is not likely to end well. What we must do is ensure that these resources are not reported to Placement. This is done by default in the Stein release of Nova, and Rocky may be configured to do the same by setting the following in nova.conf:

[workarounds]
report_ironic_standard_resource_class_inventory = False

However, if we do that, then Nova will attempt to remove inventory from Placement resource providers that is already consumed by our instance, and will receive a HTTP 409 Conflict. This will quickly fill our logs with unhelpful noise.

Flavor migration

Thankfully, there is a solution. We can modify the embedded flavor in our existing instances to remove the standard resource class inventory, which will result in the removal of the allocation of these resources from Placement. This will allow Nova to remove the inventory from the resource provider. There is a Nova patch started by Matt Riedemann which will remove our standard resource class inventory. The patch needs pushing over the line, but works well enough to be cherry-picked to Rocky.

The migration can be done offline or online. We chose to do it offline, to avoid the need to deploy this patch. For each node to be migrated:

nova-manage db ironic_flavor_migration --resource_class <node resource class> --host <host> --node <node UUID>

Alternatively, if all nodes have the same resource class:

nova-manage db ironic_flavor_migration --resource_class <node resource class> --all

You can check the instance embedded flavors have been updated correctly via the database:

sql> use nova
sql> select flavor from instance_extra;

Now (Rocky only), standard resource class inventory reporting can be disabled. After the nova compute service has been running for a while, Placement will be updated:

$ openstack resource provider inventory list <node UUID>
+----------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+----------------+------------------+----------+----------+-----------+----------+-------+
| CUSTOM_GOLD    |              1.0 |        1 |        0 |         1 |        1 |     1 |
+----------------+------------------+----------+----------+-----------+----------+-------+

$ openstack resource provider usage show <node UUID>
+----------------+--------+
| resource_class |  usage |
+----------------+--------+
| CUSTOM_GOLD    |      1 |
+----------------+--------+

Summary

We hope this shows that OpenStack is now in a place where VMs and bare metal can coexist peacefully, and that even for those pesky pets, there is a path forward to this brave new world. Thanks to the Nova team for working hard to make Ironic a first class citizen.

by Mark Goddard at November 06, 2019 02:00 AM

November 04, 2019

Dan Smith

Start and Monitor Image Pre-cache Operations in Nova

When you boot an instance in Nova, you provide a reference to an image. In many cases, once Nova has selected a host, the virt driver on that node downloads the image from Glance and uses it as the basis for the root disk of your instance. If your nodes are using a virt driver that supports image caching, then that image only needs to be downloaded once per node, which means the first instance to use that image causes it to be downloaded (and thus has to wait). Subsequent instances based on that image will boot much faster as the image is already resident.

If you manage an application that involves booting a lot of instances from the same image, you know that the time-to-boot for those instances could be vastly reduced if the image is already resident on the compute nodes you will land on. If you are trying to avoid the latency of rolling out a new image, this becomes a critical calculation. For years, people have asked for or proposed solutions in Nova for allowing some sort of image pre-caching to solve this, but those discussions have always become stalled in detail hell. Some people have resorted to hacks like booting host-targeted tiny instances ahead of time, direct injection of image files to Nova’s cache directory, or local code modifications. Starting in the Ussuri release, such hacks will no longer be necessary.

Image pre-caching in Ussuri

Nova’s now-merged image caching feature includes a very lightweight and no-promises way to request that an image be cached on a group of hosts (defined by a host aggregate). In order to avoid some of the roadblocks to success that have plagued previous attempts, the new API does not attempt to provide a rich status result, nor a way to poll for or check on the status of a caching operation. There is also no scheduling, persistence, or reporting of which images are cached where. Asking Nova to cache one or more images on a group of hosts is similar to asking those hosts to boot an instance there, but without the overhead that goes along with it. That means that images cached as part of such a request will be subject to the same expiry timer as any other. If you want them to remain resident on the nodes permanently, you must re-request the images before the expiry timer would have purged them. Each time an image is pre-cached on a host, the timestamp for purge is updated if the image is already resident.

Obviously for a large cloud, status and monitoring of the cache process in some way is required, especially if you are waiting for it to complete before starting a rollout. The subject of this post is to demonstrate how this can be done with notifications.

Example setup

Before we can talk about how to kick off and monitor a caching operation, we need to set up the basic elements of a deployment. That means we need some compute nodes, and for those nodes to be in an aggregate that represents the group that will be the target of our pre-caching operation. In this example, I have a 100-node cloud with numbered nodes that look like this:

$ nova service-list --binary nova-compute
+--------------+--------------+
| Binary | Host |
+--------------+--------------+
| nova-compute | guaranine1 |
| nova-compute | guarnaine2 |
| nova-compute | guaranine3 |
| nova-compute | guaranine4 |
| nova-compute | guaranine5 |
| nova-compute | guaranine6 |
| nova-compute | guaranine7 |
.... and so on ...
| nova-compute | guaranine100 |
+--------------+--------------+

In order to be able to request that an image be pre-cached on these nodes, I need to put some of them into an aggregate. I will do that programmatically since there are so many of them like this:

$ nova aggregate-create my-application
+----+-----------------+-------------------+-------+----------+--------------------------------------+
| Id | Name | Availability Zone | Hosts | Metadata | UUID |
+----+-----------------+-------------------+-------+----------+--------------------------------------+
| 2 | my-application | - | | | cf6aa111-cade-4477-a185-a5c869bc3954 |
+----+-----------------+-------------------+-------+----------+--------------------------------------+
$ for i in seq 1 95; do nova aggregate-add-host my-application guaranine$i; done
... lots of noise ...

Now that I have done that, I am able to request that an image be pre-cached on all the nodes within that aggregate by using the nova aggregate-cache-images command:

$ nova aggregate-cache-images my-application c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2

If all goes to plan, sometime in the future all of the hosts in that aggregate will have fetched the image into their local cache and will be able to use that for subsequent instance creation. Depending on your configuration, that happens largely sequentially to avoid storming Glance, and with so many hosts and a decently-sized image, it could take a while. If I am waiting to deploy my application until all the compute hosts have the image, I need some way of monitoring the process.

Monitoring progress

Many of the OpenStack services send notifications via the messaging bus (i.e. RabbitMQ) and Nova is no exception. That means that whenever things happen, Nova sends information about those things to a queue on that bus (if so configured) which you can use to receive asynchronous information about the system.

The image pre-cache operation sends start and end versioned notifications, as well as progress notifications for each host in the aggregate, which allows you to follow along. Ensure that you have set [notifications]/notification_format=versioned in your config file in order to receive these. A sample intermediate notification looks like this:

{
'index': 68,
'total': 95,
'images_failed': [],
'uuid': 'ccf82bd4-a15e-43c5-83ad-b23970338139',
'images_cached': ['c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2'],
'host': 'guaranine68',
'id': 1,
'name': 'my-application',
}

This tells us that host guaranine68 just completed its cache operation for one image in the my-application aggregate. It was host 68 of 95 total. Since the image ID we used is in the images_cached list, that means it was either successfully downloaded on that node, or was already present. If the image failed to download for some reason, it would be in the images_failed list.

In order to demonstrate what this might look like, I wrote some example code. This is not intended to be production-ready, but will provide a template for you to write something of your own to connect to the bus and monitor a cache operation. You would run this before kicking off the process, it waits for a cache operation to begin, prints information about progress, and then exists with a non-zero status code if there were any errors detected. For the above example invocation, the output looks like this:

$ python image_cache_watcher.py
Image cache started on 95 hosts
Aggregate 'foo' host 95: 100% complete (8 errors)
Completed 94 hosts, 8 errors in 2m31s
Errors from hosts:
guaranine2
guaranine3
guaranine4
guaranine5
guaranine6
guaranine7
guaranine8
guaranine9
Image c3b84ecf-43e9-4c6c-adfd-ab6db0e2bca2 failed 8 times

In this case, I intentionally configured eight hosts so that the image download would fail for demonstration purposes.

Future

The image caching functionality in Nova may gain more features in the future, but for now, it is a best-effort sort of thing. With just a little bit of scripting, Ussuri operators should be able to kick off and monitor image pre-cache operations and substantially improve time-to-boot performance for their users.

by Dan at November 04, 2019 07:30 PM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
February 19, 2020 01:21 AM
All times are UTC.

Powered by:
Planet