May 24, 2019

OpenStack Superuser

What’s next: 5G network slicing with ETSI OSM 5 and OpenStack

Network slicing is an innovative network architecture technology that’s also one of the most exciting promises of 5G telecom networks. Imagine a single spectrum of the network that can be divided logically for different use cases with specific characteristics and network requirements like latency, high bandwidth, security, etc.

Plus, services in those different slices can be managed separately, enabling privacy and dedicated bandwidth to run critical operations. The technology also offers huge performance boosts for specific use cases, such as industrial internet of things, autonomous driving, and more.

There are various annotations about the standardization about NS from various vendors. However, currently, various proof-of-concepts are undergoing to test the operations, performance and basic and ready-to-run architectures for NS.

PoCs and analysis conducted in communities like OpenStack and ETSI OSM have come to some conclusions around network slicing and its readiness. Experts note that end-to-end virtualization, dynamic centralized orchestration and quality-of-service manageability are basic requirements for successful NS implementation.

Last year at the OpenStack Summit in Vancouver, architects Curtis Collicutt and Corey Erickson evaluated OpenStack networking projects for network slicing implementation. More recently at the Mobile World Congress 2019, Telefonica and Telenor collaborated to demonstrate orchestration of network services like enhanced mobile broadband (eMBB) and ultra-reliable low-latency communications (URLLC) in network slicing environments.

Let’s take a look at the capabilities of Open Source MANO (OSM) and OpenStack for network slicing.

Orchestrating 5G network slices with OSM

The latest release 5 of ETSI OSM came with major enhancements to support network slicing features. OSM has integrated slice manager and extended information model for network slicing template (NST) and network slice instance (NSI).

Having a common information model is a vital feature of OSM. Modelling across different entities like network function packages (VNF, PNF and hybrid NFs), network service packages and network slices packages help to overcome complex network operations for repetitive function and drastically simplify and automation daily operations. OSM network slice feature with its IM allows network services to stay self-contained and agnostic to technology infrastructure for completely different network characteristics in each service.

The proof-of-concept demo included the deployment of two network slices with some input parameters and operating them through day-2 operations at the network slice level.

Deployment

  • Each slice is modeled as a set of network services connected by networks or VLDs
  • Simple input parameters determine the type of slice to be created on demand
  • The two slices share some network services (shared NS Subnets)
    • If the shared NS has already been deployed, it won’t be re-deployed
    • It will be reused, but the initial configuration for the second network slice can still be done in the shared NS to let it know that new elements are present.

Operation

Running day-two primitives at Network Slice level (handled as a single object)

  • OSM, behind the scenes, maps them to a sequence of calls to NS primitives, which, in turn, are derived in calls to VNF primitives

Here’s a graphic of how it works. Slice one is dedicated to enhanced mobile broadband (eMBB) and slice two is for enhanced mobile broadband (eMBB) use cases.

Figure – Network Slice Orchestration with OSM

OpenStack support for network slicing

In the last two years, OpenStack has focused on support to fulfill orchestration and infrastructure management requirements for telco cloud. The software contains various projects across networking and compute domains that can be utilized for various aspect required in network slicing.

As discussed in a session by Interdynamix architects, OpenStack can be focused to satisfy quality of service, isolation, segregation, sharing networks and automation/orchestration via Neutron APIs.  OpenStack, in this use case, is mainly highlighted for policy and scheduling features in its projects like Neutron for networking and Nova for compute. OpenStack’s group based policy (GBP) is suggested to be used for network slicing. It can be responsible for enabling self-service automation and application scaling, separation of policies for slices, managing security requirements, etc.

Complementary OSF projects like StarlingX and Zuul also have capabilities that align to support 5G network slicing.

About the author

Sagar Nangare is a technology blogger, focusing on data center technologies (networking, telecom, cloud, storage) and emerging domains like edge computing, IoT, machine learning, AI). He works at Calsoft Inc. as a digital strategist.

The post What’s next: 5G network slicing with ETSI OSM 5 and OpenStack appeared first on Superuser.

by Sagar Nangare at May 24, 2019 02:01 PM

Aptira

Platform and Service Assurance in a Network Functions Virtualisation (NFV) Platform

One of the biggest challenges in building a Network Functions Virtualisation (NFV) platform is reducing OPEX costs and bringing flexibility to the platform. Most vendors offer monitoring tools, however many tools don’t have the visibility to detect issues that are taking place within other components, requiring the use of multiple systems, increasing cost and reducing platform flexibility.


The Challenge

Platform agility is only possible if there is complete operational visibility across following components in NFV stack:

  • Virtualized Network Functions (VNF)
  • Virtualized Infrastructure Manager (VIM) where VNF workloads are deployed as either VMs/Containers
  • Hardware/Infrastructure layer – Racks, Bare metal nodes
  • Network Layer – Switches, Routers, SDNs

Most of the vendors offer different suites of monitoring tools in each component in order to ensure operational and production readiness of the layer that they are operating in. For instance, each VNF vendor rolls out a Virtual Network Functions Manager (VNFM) that handles life cycle events e.g. self-remediation of a service in VNF should it encounter a problem. However, this VNF-specific monitoring tool doesn’t have visibility of the issues that are occurring in any other components. Problem diagnosis requires an operator to interrogate multiple systems. This means multiple UI’s, multiple monitoring models and multiple views and / or reports.


The Aptira Solution

A centralised Service and Platform Assurance system is required to integrate with multiple heterogenous components. This solves the lack of complete visibility of the whole Network Functions Virtualisation platform across different Network Functions Virtualisation Infrastructure (NFVi) Points of Presence (PoPs). Implementing such a centralised system requires identifying all the failure domains in platform, their critical data points and mechanism to extract these data points.

So, key responsibilities of the system include:

  • A data collection mechanism to collect data points such as performance metrics, usage
  • A policy framework that defines a set of policies to correlate the data collected and perform corrective actions by detecting anomalies
  • A single dashboard view that gives the information of all KPIs in the system such as Alarms, Capacity, Impacted services/components, Congestion

A representation of such a system is shown below:

Aptira - NFV Network Function Virtualisation

Aptira solved this for a large Telco by developing a framework using Open Source tools – TICK Stack and Cloudify. TICK stack was selected due to its wide community support and the stack’s existing integrations with 3rd party software components. Cloudify was selected because of its ability to handle CLAMP policies at scale across NFV domains.

TICK Stack uses Telegraf as its data collection component that uses a wide range of plugins to collect different set of data from multiple sources. Aptira used REST plugins to fetch the data from components such as VNFM, OpenStack, Kubernetes endpoints and used SNMP plugins from legacy VNFs. Once data is collected, they are then stored in InfluxDB database for further analysis.

TICK Stack uses the Kapacitor component for defining event management policies. These policies correlate events/data collected and triggers a corrective action. Aptira designed and implemented policies that acted on data collected from OpenStack endpoints and the VNF telemetry data to detect anomalies and triggers a remediation plan. For example, detecting that a VNF is unhealthy (e.g. due to high CPU load/throughput) and triggering a remediation process (e.g. Auto-Scale to distribute high-load across more instances of the VNF).

Since a VNF is modelled and orchestrated by Cloudify NFVO, Kapacitor policies interacts with Cloudify to perform a corrective action at a domain level, such as rerouting all the traffic being sent to the affected VNF to another VNF, and thereby applying a close loop control policy.


The Result

To have a complete visibility of the platform and the services that are running on them, it is important to have a subsystem integrated in Network Functions Virtualisation platform that not only ensures the uptime of the components, but also provides enough information to the Operations team to identify anomaly patterns and provide a quick feedback to the teams concerned.

These Open Source tools have enabled us to provide the required visibility into their NFV platform, reducing the customer’s OPEX costs and increasing flexibility of their platform.


Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Platform and Service Assurance in a Network Functions Virtualisation (NFV) Platform appeared first on Aptira.

by Aptira at May 24, 2019 01:25 PM

Chris Dent

Placement Update 19-20

Placement update 19-20. Lots of cleanups in progress, laying in the groundwork to do the nested magic work (see themes below).

The poll to determine what to do with the weekly meeting will close at the end of today. Thus far the leader is office hours. Whatever the outcome, the meeting that would happen this coming Monday is cancelled because many people will be having a holiday.

Most Important

The spec for nested magic is ready for more robust review. Since most of the work happening in placement this cycle is described by that spec, getting it reviewed well and quickly is important.

Generally speaking: review things. This is, and always will be, the most important thing to do.

What's Changed

  • os-resource-classes 0.4.0 was released, promptly breaking the placement gate (tests are broken not os-resource-classes). Fixes underway.

  • Null root provider protections have been removed and a blocker migration and status check added. This removes a few now redundant joins in the SQL queries which should help with our ongoing efforts to speed up and simplify getting allocation candidates.

  • I had suggested an additional core group for os-traits and os-resource-classes but after discussion with various people it was decided it's easier/better to be aware of the right subject matter experts and call them in to the reviews when required.

Specs/Features

  • https://review.opendev.org/654799 Support Consumer Types. This is very close with a few details to work out on what we're willing and able to query on. It's a week later and it still only has reviews from me so far.

  • https://review.opendev.org/658510 Spec for Nested Magic. Un-wipped.

  • https://review.opendev.org/657582 Resource provider - request group mapping in allocation candidate. This spec was copied over from nova. It is a requirement of the overall nested magic theme. While it has a well-defined and refined design, there's currently no one on the hook implement it.

These and other features being considered can be found on the feature worklist.

Some non-placement specs are listed in the Other section below.

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 20 (-3) stories in the placement group. 0 are untagged. 2 (-2) are bugs. 5 are cleanups. 11 (-1) are rfes. 2 are docs.

If you're interested in helping out with placement, those stories are good places to look.

On launchpad:

osc-placement

osc-placement is currently behind by 11 microversions. No change since the last report.

Pending changes:

Main Themes

Nested Magic

At the PTG we decided that it was worth the effort, in both Nova and Placement, to make the push to make better use of nested providers — things like NUMA layouts, multiple devices, networks — while keeping the "simple" case working well. The general ideas for this are described in a story and an evolving spec.

Some code has started, mostly to reveal issues:

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A spec has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound.

Cleanup

As we explore and extend nested functionality we'll need to do some work to make sure that the code is maintainable and has suitable performance. There's some work in progress for this that's important enough to call out as a theme:

Ed Leafe has also been doing some intriguing work on using graph databases with placement. It's not yet clear if or how it could be integrated with mainline placement, but there are likely many things to be learned from the experiment.

Other Placement

Miscellaneous changes can be found in the usual place.

There are several os-traits changes being discussed.

Other Service Users

New discoveries are added to the end. Merged stuff is removed. Starting with the next pupdate I'll also be removing anything that has had no reviews and no activity from the author in 4 weeks. Otherwise these lists get too long and uselessly noisy.

End

As indicated above, I'm going to tune these pupdates to make sure they are reporting only active links. This doesn't mean stalled out stuff will be ignored, just that it won't come back on the lists until someone does some work related to it.

by Chris Dent at May 24, 2019 12:07 PM

May 23, 2019

OpenStack Superuser

What’s next for Zuul

DENVER —  At the first Open Infrastructure Summit, continuous integration/continuous delivery project Zuul crossed the threshold to independent project status along with Kata Containers.

After graduation, now what? James Blair, Zuul maintainer and office of the CTO, Red Hat says that like all new grads, the project gating system is asking a lot of important “What if?” questions:

  • What if we make this change?
  • What if we upgrade this dependency?
  • What happens to the whole system if this micro-service changes?
  • What if the base container image changes?

Zuul is a project under the OSF’s CI/CD strategic focus area. The community is busy adding new features but one especially worth focusing on is changing the way people develop containerized software.

Zuul is more than CI/CD, Blair says. It’s a new way of testing that gives developers freedom to experiment, is called speculative execution. “We’ve done it for years with Git, but now we can do it with containers,” he adds.

Here’s why that’s important.

Container images are built on layers. In the stacked image system, the registry is intermediary. This can lead to images in production with inadequate testing and upstream images breaking downstream. “We want to know that changes to base images won’t break the images composed on top,” he says.

With Zuul speculative container images, registries are created as needed. Registries are ephemeral, lasting only as long as each test. Every deployment sees its future state, including the layers it depends on. And this process is invisible to deployment tooling. Test jobs use Zuul’s registry and speculative or production images. When images have been reviewed and pass tests, they’re safe to promote to production.

A key design point: You can use actual production deployment tooling in tests. Zuul speculative container images makes testing more like production, not the other way around. Zuul allows its users to move fast without fear, because its speculative execution feature allows them to find issues and verify solutions in complex systems before committing a single change to production.

Catch the whole six-minute presentation below and check out the other Zuul talks from the Summit.

Get involved

Get the Source
Zuul is Free and Open Source Software. Download the source from git.zuul-ci.org or install it from PyPI.

Read the Docs
Zuul offers extensive documentation.

Join the Mailing List
Zuul has mailing lists for announcements and discussions.

Chat on IRC
Join #zuul on FreeNode.

The post What’s next for Zuul appeared first on Superuser.

by Superuser at May 23, 2019 02:21 PM

Aptira

Network Functions Virtualisation Orchestration (NFVO) Homologation

A major Telco needs to establish whether a particular Network Functions Virtualisation Orchestration (NFVO) solution performed correctly, not only as per the vendor claims, but also as per their specific market requirements.


The Challenge

Network Functions Virtualisation (NFV) Orchestration (NFVO) co-ordinates the resources and networks needed to set up Cloud-based services and applications. The customer has a long-established model lab which closely mimicked their operational production environment. The customer’s Model lab included instances of various compute hardware and network equipment. There were multiple challenges:

  • The time available in the Model Lab for this exercise was limited
  • Remote access to the infrastructure was intermittent and technically complex
  • Limited support resources
  • The verification constraints for the homologation process were very specific to the customer

The Aptira Solution

This assignment required both broad and deep technical knowledge and the ability to think on the fly as problems arose or technical requirements are clarified.

In addition to our own internal team, we reached out to our network of partners and identified a team of software engineers across Israel, Ukraine and Poland who could provide extra support across multiple time zones for this project. A team was spun up including Project management and Technical leadership in Australia on-site with the customer, and software engineers spanned across 4 continents. The virtual team co-ordinated activities using Jira, Confluence and Slack.

This team was able to assign tasks amongst themselves to work in parallel:

  • Lab access, environment detail
  • VIM configuration and core software installation
  • Orchestration policy development and testing

Most of the work was completed on the developer’s own infrastructure and integrated into Aptira’s lab environment (with the appropriate simulation of external interfaces). Only once we were sure that the orchestration policies were working from a logic perspective did we schedule access to the customer’s Model Lab to install and test the configuration.  

It was only after the orchestration configurations were installed that we could actually interface with the very specific items of equipment required by the customer. These items include: 

  • Cloudify Manager cluster in VMware and VMs in OpenStack targeted to be orchestrated by Cloudify
  • Cloudify Manager and vCenter
  • Ericsson Cloud Manager (ECM) and Ericsson Execution Environment (EEC)

After each use case was validated by the customer, it was then rolled out of the Model Lab to free up resources. This validation included:

  • Cloudify Manager HA failover
  • CLAMP functions such as Auto-heal and Auto-scale
  • LDAP-based Authentication
  • Ansible Integration using Netconf
  • User RBAC and Resource Authorisation process (custom development)
  • Alarm Generation
  • Reporting

During the validation process, having resources on different continents meant that we had a de-facto follow-the-sun support arrangement. As such, we were able to fix problems rapidly if we encountered issues in the customer’s Model Lab.


The Result

Although the total job was not huge, the customers’ lab constraints and the specificity of the validation requirements meant that this was an exacting assignment requiring great attention to detail.

As a result of this assignment, we were able to confirm that the Network Functions Virtualisation Orchestration (NFVO) solution performed correctly, not only as per the vendor claims, but also as per their specific market requirements. 

This project could not have been completed without the support of our partners. This is why we go to great lengths to select the best of the best when it comes to technology partners. We do this to provide our customers with innovative solutions that bring better consistency, performance and flexibility. With these partnerships, we’re able to deliver seamless services worldwide without the limitations of operating across multiple time zones.


Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Network Functions Virtualisation Orchestration (NFVO) Homologation appeared first on Aptira.

by Aptira at May 23, 2019 01:58 PM

May 22, 2019

OpenStack Superuser

Takeaways from the first open multi-vendor NFV showcase

At the recent Open Infrastructure Summit in Denver, Whitestack announced the results of the initiative called “Open Multivendor NFV Showcase,” an effort to demonstrate that network function virtualization-orchestrated network services, integrating VNF from multiple vendors, on top of commoditized hardware, is possible.

This effort, organized by Whitestack, has the support of relevant institutions in the NFV field, in particular: Intel, the OpenStack Foundation, Open Source MANO and the ETSI NFV Plugtests Programme.

For this first edition, Whitestack invited a number of vendors and projects that provide a complete end-to-end service chain, covering critical parts of a typical mobile network. Specifically, the following VNF vendors were integrated to provide a fully-functional and automated network service:

  • Fortinet: Next Generation FW.
  • Open Air Interface: LTE EPC core.
  • Mobileum: Diameter Routing Agent and Network Traffic Redirection.
  • ng4T: vTester supporting Cloud-RAN, vHSS and other functions.

See the complete session, including a live demo here or below and download the complete report by clicking here.

The post Takeaways from the first open multi-vendor NFV showcase appeared first on Superuser.

by Gianpietro Lavado at May 22, 2019 02:04 PM

Aptira

OpenStack Container Orchestration with Kolla-Ansible

Aptira OpenStack Kolla Ansible

A leading international Software Defined Networks (SDN) and Data Analytics Provider would like to upgrade their applications to utilise OpenStack and Container Orchestration, but were running into complications and needed a bit of extra help.


The Challenge

This customer had attempted to deploy OpenStack on their own several times but had run into complications. OpenStack is not their expertise and their design was based on TripleO – which is quite complicated to deploy and operate. They required help with the platform design and configuration so they can containerise their applications. As they are located overseas, our Solutionauts were operating across time-zones and completing all work remotely.


The Aptira Solution

Aptira designed a containerised OpenStack solution utilising Ceph as the backend storage for image, volume and compute. We then used kolla-ansible to deploy Ceph and OpenStack. We chose this configuration because it’s relatively simple to use kolla-ansible to customise configurations and to change the deployment, making ongoing configurations easier on the customer once the project has been handed over.

The Ceph cluster had 4 replicas, with Ceph mons/mgr running on 3 rack servers, while the object storage devices (OSDs) are running across 8 blade servers in two chassis. There are three regions to host their apps in different failure domains for redundancy. The OpenStack controllers were converged with Ceph mons on rack servers, and Compute collocated with OSDs on blade servers.

We successfully resolved a number of issues that arose during the implementation. One issue we faced was a memory leak bug in the OVS code which had not yet been fixed in upstream. As a temporary workaround, we were required to restart the neutron agent services regularly to release memory until the bug has been fixed upstream. In order to speed up this process and remove manual intervention, we setup a cron job which will restart neutron agent services on out of business hours.

Another separate challenge was found in the default haproxy maxconn which was not large enough, resulting in instability. To resolve this, we increased the haproxy maxconn value in the haproxy config file, improving the stability of the platform.


The Result

We delivered the Ceph-backed OpenStack Cloud on which their applications are now deployed. The configuration has passed their HA tests and is being used in production.

It is important to note that we deployed OpenStack Rocky which was the latest stable OpenStack release at the time of this project. Unfortunately, Kolla-ansible is unable to complete upgrades and downgrades on this version, so future work will be required in order to simplify the upgrade/downgrade process. Stay tuned!


Orchestrate your Application into the Future
Containerise your App

Get Started

The post OpenStack Container Orchestration with Kolla-Ansible appeared first on Aptira.

by Aptira at May 22, 2019 01:31 PM

Thomas Goirand

Wrote a Debian mirror setup puppet module in 3 hours

As I needed the functionality, I wrote this:

https://salsa.debian.org/openstack-team/puppet/puppet-module-debian-archvsync

The matching Debian package has been uploaded and is now in the NEW queue. Thanks a lot to Waldi for packaging ftpsync, which I’m using.

Comments and contributions are welcome.

by Goirand Thomas at May 22, 2019 12:40 PM

CERN Tech Blog

Cluster Autoscaling for Magnum Kubernetes Clusters

The Kubernetes cluster autoscaler has been in development since 2016, with early support for the major public cloud providers for Kubernetes. But, there has been no way to use it running Kubernetes on OpenStack until now, with the addition of the autoscaler cloud provider for Magnum. As an OpenStack cloud operator with around 400 Magnum clusters (majority Kubernetes), CERN has a lot to gain from the flexibility that the cluster autoscaler provides.

by CERN (techblog-contact@cern.ch) at May 22, 2019 10:00 AM

May 21, 2019

OpenStack Superuser

Firecracker and Kata Containers: Sparking more open collaboration

DENVER — Some pairings really do spark joy. Peanut butter and chocolate. Wine and cheese. Biscuits and gravy. The concept crosses over to the tech world: Firecracker and Kata Containers.

On the Open Infrastructure keynote stage in Denver, Samuel Ortiz, architecture committee, Kata Containers and Andreea Florescu, maintainer, Firecracker project, talked about how the projects are working together.

The pair introduced a new collaborative project: rust-vmm. Firecracker allows Kata Containers to support a large number of container workloads, but not all of them. OSF, Amazon, Intel, Google and others are now collaborating to build a custom container hypervisor. Enter rust-vmm, a project featuring shared virtualization components to build specialized VMMs.

But let’s get up to speed on the two projects and what’s next for them in detail.

Kata Containers

Kata Containers aim to improve security in the container ecosystem by adding lightweight VMs and hypervisors as another, hardware-based workload isolation layer for containers. Kata Containers has offered a number of enhancements since May 2018 (six releases to date, with another shipping soon) including:

  • Improved performance with VM templating, TC mirroring to improve networking performance and the soon-to-be integrated Virtio-fs support.
  • Improved simplicity and operability by adding distributed tracing support, live update and overall simplified architecture based on vsock.
  • Improved industry support by adding new hardware architectures like ARM64, ppc64 and s390
  • Even stronger security architecture by adding more libcontainer-based isolation layers inside the virtual machine, but most importantly by supporting more hypervisors, including QEMU, NEMU and Firecracker.

Firecracker

Firecracker is an open-source, lightweight virtual machine monitor written in Rust. It leverages Linux Kernel Virtual Machine (KVM) to provide isolation for multi-tenant cloud workloads like containers and functions.

What makes it great:

  • It “boots blazingly fast” (under 125 milliseconds)
  • Low memory footprint, helping it achieve high densities (<5MiB)
  • Oversubscription
  • Security: two boundaries–virtualization and jailer

Florescu also outlined some of the main enhancements in progress:

  • ARM and AMD support
  • Refactoring the codebase for standalone virtualization components that can be used by other projects.
  • Container integration: Transitioning from an experimental implementation of Vsock to a production ready version; also integrating firecracker-containerd, which is a container runtime on top of Firecracker.

Check out the whole 12-minute keynote below and stay tuned for a video from their Summit session titled “Tailor-made security: Building a container specific hypervisor.”

Photo // CC BY NC

The post Firecracker and Kata Containers: Sparking more open collaboration appeared first on Superuser.

by Superuser at May 21, 2019 02:07 PM

Aptira

Zenoss Implementation

Zenoss Monitoring

This use case covers two of a Telco’s custom-developed platforms that provide network overlay services to Fortune 500 companies and government entities. They require platform-wide monitoring and would like to utilise Zenoss – an intelligent application and service monitoring tool.


The Challenge

The two custom platforms this Telco has developed consist of OpenFlow Network infrastructure, OpenStack Infrastructure and several applications including: Cloudify, an SDN Controller and Network Flow Programming Tools.

The customer had a requirement for platform-wide monitoring to capture operational events in the platform and send them to a third-party dashboard in near real-time. They had already selected (and were previously using) a tool named Zenoss as the event monitoring and management platform and asked Aptira to configure it to meet their requirements.

The platform components that required monitoring included:

Hardware

  • Bare Metal Servers
  • Top of Rack Network Switches
  • Noviflow switches

Software

  • Linux Operating System
  • Server OS (KVM hypervisors)
  • Top of Rack Network Switch Operating System
  • Noviware Operating System

Applications

  • OpenStack + Ceph cluster
  • Cloudify cluster
  • SDN controllers

Their requirements extended to additional metrics and custom events, thresholds and alerts that were not available in the standard platform. In order to implement these requirements, we exploited a feature of Zenoss that allows easy expansion of monitoring capability as modular plugins. These plugins are called ZenPacks.

Some of the customer’s requirements were covered in existing Zenpacks, e.g. the OpenStack and Bare Metal Server ILO Zenpacks. However, most requirements were not covered by any existing Zenpacks. Examples of components that needed additional capabilities include:

  • Cloudify Services and Cluster health check
  • SDN controller service and UI health check
  • SDN Etree, Eline services status
  • Noviflow Eline and Etree paths
  • Noviflow CLI, OF, Physical ports status, etc

Implementing these additional capabilities was a key objective of Aptira’s solution.


The Aptira Solution

Aptira developed custom capabilities for ZenOSS to provide functionality for custom monitoring, and to send alerts to a third-party dashboard. To fulfil these additional requirements, we were able to develop custom plugin capabilities.

We considered the option of enhancing existing Zenpacks like OpenStack, Bare Metal Servers ILO, and Linux. However, this would have created dependencies to multiple existing Zenpacks that would have complicated their lifecycle management. Any enhancements of existing Zenpacks would have to be continually updated with the custom-developed enhancements.

Instead, we developed a single custom Zenpack and integrated it with ZenOSS and the platforms, implementing the required functionality and making it easy to update and maintain existing Zenpacks.

To send the alerts to the third-party dashboard, we created Ansible playbooks which can easily add/remove devices to/from Zenoss and the third-party alerting dashboard.

Aptira also developed Ansible playbooks to perform maintenance functions on the integrated Zenoss solution, including adding devices, configuring events and triggers and notification/alerts via those playbooks.

This entire process was completed (including requirements, design, development and configuration) within the customer’s platform. This included:

  • Adding the devices in the third-party alerting dashboard and Zenoss
  • Configuring the Ansible playbooks that perform events notification
  • Installed the custom-developed Zenpack on the customer’s operational Zenoss system

The Result

Aptira was able to successfully complete this enhancement and enabling the customer to monitor all devices with their custom requirements. Events are now visible within Zenoss as well as the third-party dashboard.


Monitoring and Machine Learning
Detect Anomalies within Complex Systems

Find Out Here

The post Zenoss Implementation appeared first on Aptira.

by Aptira at May 21, 2019 01:37 PM

Carlos Camacho

The Kubernetes in a box project

Implementing cloud computing solutions that runs in hybrid environments might be the final solution when comes to finding the best benefits/cost ratio.

This post will be the main thread to build and describe the KIAB/Kubebox project (www.kubebox.org and/or www.kiab.org).

Spoiler alert!

The name

First thing first, the name.. I have in my mind two names having the same meaning. The first one is KIAB (Kubernetes In A Box) this name came to my mind as the Kiai sound from karatekas (practitioners of karate). The second one is more traditional, “Kubebox”. I have no preference but it would be awesome if you help me to decide the official name for this project.

Add a comment and contribute to select the project name!

Introduction

This project is about to integrate together already market available devices to run cloud software as an appliance.

The proof-of-concept delivered in this series of posts will allow people to put a well-known set of hardware devices into a single chassis for either create their own cloud appliances, research and development, continuous integration, testing, home labs, staging or production-ready environments or simply just for fun.

Hereby it’s humbly presented to you the design of KubeBox/KIAB an open chassis specification for building cloud appliances.

The case enclosure is fully designed, and hopefully in the last phases for building the first set of enclosures, now, the posts will appear in the mean time I have some free cycles for writing the overall description.

Use cases

Several use cases can be defined to run on a KubeBox chassis.

  • AWS outpost.
  • Development environments.
  • EDGE.
  • Production Environments for small sites.
  • GitLab CI integration.
  • Demos for summits and conferences.
  • R&D: FPGA usage, deep learning, AI, TensorFlow, among many others.
  • Marketing WOW effect.
  • Training.

Enclosure design

The enclosure is designed as a rackable unit, using 7U. It tries to minimize the space used to deploy an up to 8-node cluster with redundancy for both power and networking.

Cloud appliance description

This build will be described across several sub-posts linked from this main thread. The posts will be created particularly without any specific order depending on my availability.

  • Backstory and initial parts selection.
  • Designing the case part 1: Design software.
  • A brief introduction to CAD software.
  • Designing the case part 2: U’s, brakes, and ghosts.
  • Designing the case part 3: Sheet thickness and bend radius.
  • Designing the case part 4: Parts Allowance (finish, tolerance, and fit).
  • Designing the case part 5: Vent cutouts and frickin’ laser beams!.
  • Designing the case part 6: Self-clinching nuts and standoffs.
  • Designing the case part 7: The standoffs strike back.
  • A brief primer on screws and PEMSERTs.
  • Designing the case part 8: Implementing PEMSERTs and screws.
  • Designing the case part 9: Bend reliefs and flat patterns.
  • Designing the case part 10: Shelf caddy, to be used with GPU, MB, disks, any other peripherals you want to add to the enclosure.
  • Designing the case part 11: Components rig.
  • Designing the case part 12: Power supply.
  • Designing the case part 13: Networking.
  • Designing the case part 14: 3D printed supports.
  • Designing the case part 15: Adding computing power.
  • Designing the case part 16: Adding Storage.
  • Designing the case part 17: Front display and bastion for automation.
  • Manufacturing the case part 1: PEMSERT installation.
  • Manufacturing the case part 2: Bending metal.
  • Manufacturing the case part 3: Bending metal.
  • KubeBox cloud appliance in detail!.
  • Manufacturing the case part 0: Getting quotes.
  • Manufacturing the case part 1: Getting the cases.
  • Software deployments: Reference architecture.
  • Design final source files for the enclosure design.
  • KubeBox is fully functional.

Update log:

2019/05/21: Initial version.

by Carlos Camacho at May 21, 2019 12:00 AM

May 20, 2019

OpenStack Superuser

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

Spotlight on the Open Infrastructure Summit Denver

The global community gathered recently in Denver for the Open Infrastructure Summit followed by the Project Teams Gathering (PTG). This was the first edition under the new name, which was changed to better reflect the diversity of open-source communities collaborating at the Summit. With the co-location of the PTG, the week had more collaborative sessions than ever and attendees had the opportunity to collaborate throughout the week with presentations, workshops, and collaborative sessions covering the development, integration and deployment of more than 30 open-source projects.

The theme of the week was “Collaboration without Boundaries,” a call to the community shared by Jonathan Bryce in his Monday morning keynote. Collaboration was exemplified throughout the week from the developers, operators and vendors attending the event:

  • Developers from the Kata Containers and Firecracker projects highlighted the progress around community collaboration and project integration. They also discussed Rust-VMM, a cross-project collaborative initiative to develop container-specific hypervisors.
  • Operators from Baidu, Blizzard Entertainment, CERN, Box, Adobe Advertising Cloud and more presented their open infrastructure use cases, highlighting the integration of multiple technologies including Ceph, Kata Containers, Kubernetes, and OpenStack among 30+ other projects.
  • 5G was front and center at the Denver Summit. In a demonstration of open collaboration, Ericsson partnered with AT&T to host a 5G Lounge where attendees could test the latency of network speeds while playing a virtual reality game, Strike a Light. Users like China Mobile and AT&T presented about their 5G deployments. At AT&T, 5G is powered by an Airship-based containerized OpenStack cloud.
  • The NIST Public Working Group on Federated Cloud and The Open Research Cloud Alliance discussed possible federation deployment and governance models that embody the key concepts and design principles being developed in the NIST/IEEE Joint working group and ORCA. They want to encourage developers, users and cloud operators to provide use cases and feedback as they move forward in these efforts.

 

Amy Wheelus and Mark Collier in an epic latency battle with 3G, 4G and 5G on the keynote stage.

Denver Summit session videos are now available and for more Summit announcements, user sessions and news from the Open Infrastructure ecosystem, check out the Superuser recap.
Next, the Open Infrastructure Summit and PTG are heading to Shanghai. Registration and Sponsorship sales are now open. If you’re interested in speaking, the Call for Presentations is open. Check out the list of Tracks and submit your presentations, panels and workshops before July 2, 2019.

OpenStack Foundation news

  • At the Open Infrastructure Summit, the OpenStack Board of Directors confirmed Zuul as a top-level Open Infrastructure project, joining OpenStack and recently confirmed Kata Containers.
  • The OSF launched the OpenStack Ironic Bare Metal Program in Denver, highlighting the commercial ecosystem for Ironic, at-scale deployments of Ironic, and evolution of OpenStack beyond virtual machines. Case studies by CERN and Platform9 were published along with the announcement

OpenStack Foundation project news

Airship

  • The Airship team delivered its first release at the Open Infrastructure Summit Denver. Airship 1.0 delivers a wide range of enhancements to security, resiliency, continuous integration and documentation, as well as upgrades to the platform, deployment and tooling features.

Kata Containers

  • The community delivered several talks during the Open Infrastructure Summit in Denver that you can check out among the videos from the event.
  • Kata Containers continues to provide improvements around performance, stability and security.  Expected this week, the 1.7 release of Kata Containers includes experimental support for virtio-fs in the NEMU VMM. For workloads which require host to guest sharing, virtio-fs provides improved performance and compatibility compared to 9pfs. This release adjusts the guest kernel in order to facilitate Docker-in-Kata use cases, and adds support for the latest version of Firecracker.

OpenStack

StarlingX

  • There were five sessions dedicated to the project at the Open Infrastructure Summit in Denver, check out the videos here.
  • Participants packed room for a hands-on workshop to try out the StarlingX platform on  hardware donated by Packet.com. If you missed this one, keep an eye out for similar workshops at community upcoming industry events.
  • The team had great discussions during Forum sessions as well as at the PTG to deep dive into the details of processes, testing and roadmap planning for the upcoming two releases.

Zuul

  • The community delivered several talks during the Open Infrastructure Summit in Denver that you can check out among the videos from the event.
  • Zuul 3.8.1 was released, fixing a memory leak introduced in the previous 3.8.0 release. Users should update to at least 3.8.0 to fix this bug. More info can be found in the release notes.
  • Nodepool 3.6.0 was released. This release improves API rate limiting against OpenStack clouds and statsd metric gathering performed by the builder process. Find more info in the release notes.

OSF @ Open Infrastructure community events

Questions / feedback / contribute

This newsletter is written and edited by the OpenStack Foundation staff to highlight open infrastructure communities. We want to hear from you!
If you have feedback, news or stories that you want to share, reach us through community@openstack.org . To receive the newsletter, sign up here.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by OpenStack Foundation at May 20, 2019 02:06 PM

Aptira

Big Data

One of Australia’s largest and well-known organisations has an internal Big Data team who are managing their system in a static and traditional way. But this makes it difficult for them to expand the system. So we designed a solution they can use to upgrade their Big Data system from static infrastructure onto a flexible OpenStack Cloud.


The Challenge

The customer is running an internal Big Data system, which collects data from various internal data sources, providing a static view of this data to the management team. They are using a Hadoop-based system and have been experiencing issues with scalability. As such, they want to explore a Cloud-based hosting platform, which will give them more flexibility and scalability. They’d also like to migrate their existing production system onto the new hosting platform once it is proven to be stable and production ready.

The design of the Cloud platform had to balance short term specific goals against long term objectives. The Customer’s future vision for the platform included minimisation of the total cost of operational ownership, by designing the platform to require minimal operational intervention, and by maximising the use of automation in all stages of the operations lifecycle.

The challenge was even greater because Aptira was brought into the project relatively late in its development cycle, and therefore many decisions had already been made. Examples include a hardware and networking platforms having been selected, the rack placement of these machines being already defined, and the general approach to management of virtual and physical machines already having been determined.


The Aptira Solution

We may be a little biased when it comes to OpenStack, but there really is no better system for requirements such as these. As such, we’ve offered to build an OpenStack cloud to host their Big Data system.

This OpenStack will be integrated with their existing Cisco Application Centric Infrastructure (ACI) in order to provide a complete end-to-end Software Defined Networking (SDN) solution.

A complication that Aptira had to design around was the version compatibilities across the integrated subsystem. For example, Cisco ACI was only supported on OpenStack Ocata, which was rapidly approaching end of support. The integration design required painstaking attention to detail to enable functionality while at the same time honouring the design decisions made by the customer.

We have produced a complete design document which they can use to guide their OpenStack and upgrade their Big Data system from static infrastructure onto a flexible OpenStack Cloud solution.


The Result

This project is still in development – stay tuned for updates!


How can we make OpenStack work for you?
Find out what else we can do with OpenStack.

Find Out Here

The post Big Data appeared first on Aptira.

by Aptira at May 20, 2019 01:39 PM

Carlos Camacho

Running Relax-and-Recover to save your OpenStack deployment

ReAR is a pretty impressive disaster recovery solution for Linux. Relax-and-Recover, creates both a bootable rescue image and a backup of the associated files you choose.

When doing disaster recovery of a system, this Rescue Image plays the files back from the backup and so in the twinkling of an eye the latest state.

Various configuration options are available for the rescue image. For example, slim ISO files, USB sticks or even images for PXE servers are generated. As many backup options are possible. Starting with a simple archive file (eg * .tar.gz), various backup technologies such as IBM Tivoli Storage Manager (TSM), EMC NetWorker (Legato), Bacula or even Bareos can be addressed.

The ReaR written in Bash enables the skilful distribution of Rescue Image and if necessary archive file via NFS, CIFS (SMB) or another transport method in the network. The actual recovery process then takes place via this transport route.

In this specific case, due to the nature of the OpenStack deployment we will choose those protocols that are allowed by default in the Iptables rules (SSH, SFTP in particular).

But enough with the theory, here’s a practical example of one of many possible configurations. We will apply this specific use of ReAR to recover a failed control plane after a critical maintenance task (like an upgrade).

01 - Prepare the Undercloud backup bucket.

We need to prepare the place to store the backups from the Overcloud. From the Undercloud, check you have enough space to make the backups and prepare the environment. We will also create a user in the Undercloud with no shell access to be able to push the backups from the controllers or the compute nodes.

groupadd backup
useradd -g backup -d /data/backup -s /sbin/nologin backup
echo "backup:backup" | chpasswd
chown -R backup:backup /data
chmod -R 755 /data

02 - Run the backup from the Overcloud nodes.

#Install packages
sudo yum install rear genisoimage syslinux lftp -y

#Configure ReAR
cat << EOF >> /etc/rear/local.conf
OUTPUT=ISO
OUTPUT_URL=sftp://backup:backup@undercloud-0/data/backup/
BACKUP_URL=sftp://backup:backup@undercloud-0/data/backup/
EOF

Now run the backup, this should create an ISO image in the Undercloud node (/data/backup/).

sudo rear -d -v mkbackup

Now, simulate a failure xD

# sudo rm -rf /

After the ISO image is created, we can proceed to verify we can restore it from the Hypervisor.

03 - Prepare the hypervisor.

# Install some required packages
sudo yum install -y fuse-sshfs

# Mount the Undercloud backup folder to access the images
mkdir -p /data/backup
sudo sshfs -o allow_other root@undercloud-0:/data/backup /data/backup
ls /data/backup/*

04 - Stop the damaged controller node.

virsh shutdown controller-0

# Wait until is down
watch virsh list --all

# Backup the guest definition
virsh dumpxml controller-0 > controller-0.xml
cp controller-0.xml controller-0.xml.bak

Now, we need to change the guest definition to boot from the ISO file.

Edit controller-0.xml and update it to boot from the ISO file.

Find the OS section,add the cdrom device and enable the boot menu.

<os>
<boot dev='cdrom'/>
<boot dev='hd'/>
<bootmenu enable='yes'/>
</os>

Edit the devices section and add the CDROM.

<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/data/backup/rear-controller-0.iso'/>
<target dev='hdc' bus='ide'/>
<readonly/>
<address type='drive' controller='0' bus='1' target='0' unit='0'/>
</disk>

Update the guest definition.

virsh define controller-0.xml

Restart and connect to the guest

virsh reset controller-0
virsh console controller-0

You should be able to see the boot menu to start the recover process, select Recover controller-0 and follow the instructions.

You yould see a message like:

Welcome to Relax-and-Recover. Run "rear recover" to restore your system !

RESCUE controller-0:~ # rear recover

The image restore should progress quickly.

COntinue to see the restore evolution.

Now, each time you reboot the node will have the ISO file as the first boot option so it’s something we need to fix. In the mean time let’s check if the restore went fine.

Reboot the guest booting from the hard disk.

Now we can see that the guest VM started successfully.

Now we need to restore the guest to it’s original definition, so from the Hypervisor we need to restore the controller-0.xml.bak file we created.

#From the Hypervisor
virsh shutdown controller-0
watch virsh list --all
virsh define controller-0.xml.bak
virsh start controller-0

Enjoy.

Considerations:

  • Space.
  • Multiple protocols supported but we might then to update firewall rules, that’s why I prefered SFTP.
  • Network load when moving data.
  • Shutdown/Starting sequence for HA control plane.
  • Do we need to backup the data plane?
  • User workloads should be handled by a third party backup software.

by Carlos Camacho at May 20, 2019 12:00 AM

May 17, 2019

Chris Dent

Placement Update 19-19

Woo! Placement update 19-19. First one post PTG and Summit. Thanks to everyone who helped make it a useful event for Placement. Having the pre-PTG meant that we had addressed most issues prior to getting there meaning that people were freed up to work in other areas and the discussions we did have were highly coherent.

Thanks, also, to everyone involved in getting placement deleted from nova. We did that while at the PTG and had a little celebration.

Most Important

We're still working on narrowing priorities and focusing the details of those priorities. There's an etherpad where we're taking votes on what's important. There are three specs in progress from that that need review and refinement. There are two others which have been put on the back burner (see specs section below).

What's Changed

  • We're now running a subset of nova's functional tests in placement's gate.

  • osc-placement is using the PlacementFixture to run its functional tests making them much faster.

  • There's a set of StoryBoard worklists that can be used to help find in progress work and new bugs. That section also describes how tags are used.

  • There's a summary of summaries email message that summarizes and links to various results from the PTG.

Specs/Features

As the summary of summaries points out, we have two major features this cycle, one of which is large: getting consumer types going and getting a whole suite of features going to support nested providers in a more effective fashion.

  • https://review.opendev.org/654799 Support Consumer Types. This is very close with a few details to work out on what we're willing and able to query on. It only has reviews from me so far.

  • https://review.opendev.org/658510 Spec for Nested Magic. This is associated with a lengthy story that includes visual artifacts from the PTG. It covers several related features to enable nested-related requirements from nova and neutron. It is a work in progress, with several unanswered questions. It is also something that efried started but will be unable to finish so the rest of us will need to finish it up as the questions get answered. And it also mostly subsumes a previous spec on subtree affinity. (Eric, please correct me if I'm wrong on that.)

  • https://review.opendev.org/657582 Resource provider - request group mapping in allocation candidate. This spec was copied over from nova. It is a requirement of the overall nested magic theme. While it has a well-defined and refined design, there's currently no one on the hook implement it.

There are also two specs that are still live but de-prioritized:

These and other features being considered can be found on the feature worklist.

Some non-placement specs are listed in the Other section below.

Stories/Bugs

There are 23 stories in the placement group. 0 are untagged. 4 are bugs. 5 are cleanups. 12 are rfes. 2 are docs.

If you're interested in helping out with placement, those stories are good places to look.

On launchpad:

Of those there two interesting ones to note:

  • https://bugs.launchpad.net/nova/+bug/1829062 nova placement api non-responsive due to eventlet error. When using placement-in-nova in stein, recent eventlet changes can cause issues. As I've mentioned on the bug the best way out of this problem is to use placement-in-placement but there are other solutions.

  • https://bugs.launchpad.net/nova/+bug/1829479 The allocation table has residual records when instance is evacuated and the source physical node is removed. This appears to be yet another issue related to orphaned allocations during one of the several move operations. The impact they are most concerned with, though, seems to be the common "When I bring up a new compute node with the same name there's an existing resource provider in the way" that happens because of the unique constrain on the rp name column.

I'm still not sure that constraint is the right thing unless we want to make people's lives hard when they leave behind allocations. We may want to make it hard because it will impact quota...

osc-placement

osc-placement is currently behind by 11 microversions. No change since the last report.

Pending changes:

Note: a few of these having been sitting for some time with my +2 awaiting review by some other placement core. Please remember osc-placement when reviewing.

Main Themes

Now that the PTG has passed some themes have emerged. Since the Nested Magic one is rather all encompassing and Cleanup is a catchall, I think we can consider three enough. If there's some theme that you think is critical that is being missed, let me know.

For people coming from the nova-side of the world who need or want something like review runways to know where they should be focusing their review energy, consider these themes and the links within them as a runway. But don't forget bugs and everything else.

Nested Magic

At the PTG we decided that it was worth the effort, in both Nova and Placement, to make the push to make better use of nested providers — things like NUMA layouts, multiple devices, networks — while keeping the "simple" case working well. The general ideas for this are described in a story and an evolving spec.

Some code has started, mostly to reveal issues:

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A spec has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound.

Cleanup

As we explore and extend nested functionality we'll need to do some work to make sure that the code is maintainable and has suitable performance. There's some work in progress for this that's important enough to call out as a theme:

Ed Leafe has also been doing some intriguing work on using graph databases with placement. It's not yet clear if or how it could be integrated with mainline placement, but there are likely many things to be learned from the experiment.

Other Placement

  • https://review.opendev.org/#/q/topic:refactor-classmethod-diaf A suite of refactorings that given their lack of attention perhaps we don't need or want, but let's be explicit about that rather than ignoring the patches if that is indeed the case.

  • https://review.opendev.org/645255 A start at some unit tests for the PlacementFixture which got lost in the run up to the PTG. They may be less of a requirement now that placement is running nova's functional tests. But again, we should be explicit about that decision.

Other Service Users

New discoveries are added to the end. Merged stuff is removed.

End

I'm out of practice on these things. This one took a long time.

by Chris Dent at May 17, 2019 03:32 PM

Aptira

Custom OpenStack Lab with Ceph Storage and Ansible Playbooks

Most of our case studies are written about our customers. But this time, we’re the ones requiring a new solution to enable our Solutionauts to work more efficiently. We put together a custom OpenStack Lab with Ceph Storage and Ansible playbooks, giving us access to internal resources on demand.


The Challenge

We were previously hosting various resources externally in a Data Centre. In order to make it easy for staff members to acquire and release these resources on demand, we needed to set up an internal lab with a unified system which can be used to manage compute, storage and network resources.


The Aptira Solution

The goal is to have most of our servers (12 servers) managed by OpenStack, and other resources, including 2 Noviflow switches and Ceph storage with 72 TB of space, integrated with OpenStack.

In order to maximize the use of bare-metal machines to use for user workload, we have chosen to virtualise the controller plane. As a result, three OpenStack controllers and Ceph monitors are running in VMware as virtual machines, and 8 KVM compute nodes are running on bare-metal machines. We also have another 3 bare-metal compute nodes managed by Ironic. Three out of the 8 KVM compute nodes are also Ceph OSD nodes, with each node having 12 2TB disks.

This stack runs several projects, including Keystone, Glance, Swift, Cinder, Horizon, Nova, Neutron and Ironic. Glance, Cinder and Nova use Ceph as the storage backend. Keystone uses Aptira’s active directory for user authentication. Two KVM compute nodes are cabled to our NoviSwitches and have PCI-Passthrough enabled so VMs on these compute nodes can have direct access to the ports cabled to the NoviSwitches.

The whole system is deployed by Ansible playbooks, custom developed by us. These playbooks are not specific to Aptira’s environment and can be re-used by other projects.


The Result

We now have an internal lab to access resources on demand, rather than them being hosted externally. Ansible playbooks are used to deploy OpenStack and Ceph. When deploying OpenStack components, Ansible calls upstream OpenStack Puppet modules to do the installation. That way we don’t need to re-write Ansible playbooks to install OpenStack components.


Keep your data in safe hands.
See what we can do to protect and scale your data.

Secure Your Data

The post Custom OpenStack Lab with Ceph Storage and Ansible Playbooks appeared first on Aptira.

by Aptira at May 17, 2019 01:13 PM

May 16, 2019

Pablo Iranzo Gómez

Emerging Tech VLC‘19: Citellus - Automatización de comprobaciones

Citellus:

Citellus - Verifica tus sistemas!!

https://citellus.org

Emerging Tech Valencia 2019: 16 Mayo

¿Quién soy?

Involucrado con Linux desde algo antes de comenzar los estudios universitarios y luego durante ellos, estando involucrado con las asociaciones LinUV y Valux.org.

Empecé a ‘vivir’ del software libre en 2004 y a trabajar en Red Hat en 2006 como Consultor, luego como Technical Account Manager y ahora como Software Maintenance Engineer.

¿Qué es Citellus?

  • Citellus proporciona un framework acompañado de scripts creados por la comunidad, que automatizan la detección de problemas, incluyendo problemas de configuración, conflictos con paquetes de versiones instaladas, problemas de seguridad o configuraciones inseguras y mucho más.

Historia: ¿cómo comenzó el proyecto?

  • Un fin de semana de guardia revisando una y otra vez las mismas configuraciones en diversos hosts sembró la semilla.

  • Unos scripts sencillos y un ‘wrapper’ en bash después, la herramienta fue tomando forma, poco después, se reescribió el ‘wrapper’ en python para proporcionarle características más avanzadas.

  • En esos primeros momentos también se mantuvieron conversaciones con ingeniería y como resultado, un nuevo diseño de los tests más sencillo fue adoptado.

¿Qué puedo hacer con Citellus?

  • Ejecutarlo contra un sistema en vivo o contra un sosreport.
  • Resolver problemas antes gracias a la información que proporciona.
  • Utilizar los plugins para detectar problemas actuales o futuros.
  • Programar nuevos plugins en tu lenguaje de programación preferido (bash, python, ruby, etc.) para extender la funcionalidad.
    • Contribuir al proyecto esos nuevos plugins para beneficio de otros.
  • Utilizar dicha información como parte de acciones proactivas en sus sistemas.

¿Algún ejemplo de la vida real?

  • Por ejemplo, con Citellus puedes detectar:
    • Borrados incorrectos de tokens de keystone
    • Parámetros faltantes para expirar y purgar datos de ceilometer que pueden llevar a llenar el disco duro.
    • NTP no sincronizado
    • paquetes obsoletos que están afectados por fallos críticos o de seguridad.
    • otros! (850+) complementos en este momento, con más de una comprobación por plugin en muchos de ellos
  • Cualquier otra cosa que puedas imaginar o programar 😉

Cambios derivados de ejemplos reales?

  • Inicialmente trabajábamos con RHEL únicamente (6 y 7) por ser las soportadas
  • Dado que trabajamos con otros equipos internos como RHOS-OPS que utilizan por ejemplo RDO project, la versión upstream de Red Hat OpenStack, comenzamos a adaptar tests para funcionar en ambas.
  • A mayores, empezamos a crear funciones adicionales para operar sobre sistemas Debian y un compañero estuvo también enviando propuestas para corregir algunos fallos sobre Arch Linux.
  • Con la aparición de Spectre y Meltdown empezamos a añadir también comprobación de algunos paquetes y que no se hayan deshabilitado las opciones para proteger frente a dichos ataques.

Algunos números sobre plugins:

- healthcheck : 79 [] - informative : 2 [] - negative : 3 [‘system: 1’, ‘system/iscsi: 1’] - openshift : 5 [] - openstack : 4 [‘rabbitmq: 1’] - ovirt-rhv : 1 [] - pacemaker : 2 [] - positive : 35 [‘cluster/cman: 1’, ‘openstack: 16’, ‘openstack/ceilometer: 1’, ‘system: 1’] - rhinternal : 697 [‘bugzilla/docker: 1’, ‘bugzilla/httpd: 1’, ‘bugzilla/openstack/ceilometer: 1’, ‘bugzilla/openstack/ceph: 1’, ‘bugzilla/openstack/cinder: 1’, ‘bugzilla/openstack/httpd: 1’, ‘bugzilla/openstack/keystone: 1’, ‘bugzilla/openstack/keystone/templates: 1’, ‘bugzilla/openstack/neutron: 5’, ‘bugzilla/openstack/nova: 4’, ‘bugzilla/openstack/swift: 1’, ‘bugzilla/openstack/tripleo: 2’, ‘bugzilla/systemd: 1’, ‘ceph: 4’, ‘cifs: 5’, ‘docker: 1’, ‘httpd: 1’, ‘launchpad/openstack/keystone: 1’, ‘launchpad/openstack/oslo.db: 1’, ‘network: 7’, ‘ocp-pssa/etcd: 1’, ‘ocp-pssa/master: 12’, ‘ocp-pssa/node: 14’, ‘openshift/cluster: 1’, ‘openshift/etcd: 2’, ‘openshift/node: 1’, ‘openshift/ocp-pssa/master: 2’, ‘openstack: 6’, ‘openstack/ceilometer: 2’, ‘openstack/ceph: 1’, ‘openstack/cinder: 5’, ‘openstack/containers: 4’, ‘openstack/containers/docker: 2’, ‘openstack/containers/rabbitmq: 1’, ‘openstack/crontab: 4’, ‘openstack/glance: 1’, ‘openstack/haproxy: 2’, ‘openstack/hardware: 1’, ‘openstack/iptables: 1’, ‘openstack/keystone: 3’, ‘openstack/mysql: 8’, ‘openstack/network: 6’, ‘openstack/neutron: 5’, ‘openstack/nova: 12’, ‘openstack/openvswitch: 3’, ‘openstack/pacemaker: 1’, ‘openstack/rabbitmq: 5’, ‘openstack/redis: 1’, ‘openstack/swift: 3’, ‘openstack/system: 4’, ‘openstack/systemd: 1’, ‘pacemaker: 10’, ‘satellite: 1’, ‘security: 3’, ‘security/meltdown: 2’, ‘security/spectre: 8’, ‘security/speculative-store-bypass: 8’, ‘storage: 1’, ‘sumsos/bugzilla: 11’, ‘sumsos/kbases: 426’, ‘supportability: 11’, ‘sysinfo: 2’, ‘system: 56’, ‘virtualization: 2’] - supportability : 3 [‘openshift: 1’] - sysinfo : 18 [‘lifecycle: 6’, ‘openshift: 4’, ‘openstack: 2’] - system : 12 [‘iscsi: 1’] - virtualization : 1 [] ——- total : 862

El Objetivo

  • Hacer extremadamente sencillo escribir nuevos plugins.
  • Permitir escribirlos en tu lenguaje de programación preferido.
  • Que sea abierto para que cualquiera pueda contribuir.

Cómo ejecutarlo?

A destacar

  • plugins en su lenguaje preferido
  • Permite sacar la salida a un fichero json para ser procesada por otras herramientas.
    • Permite visualizar via html el json generado
  • Soporte de playbooks ansible (en vivo y también contra un sosreport si se adaptan)
    • Las extensiones (core, ansible), permiten extender el tipo de plugins soportado fácilmente.
  • Salvar/restaurar la configuración
  • Instalar desde pip/pipsi si no quieres usar el git clone del repositorio o ejecutar desde un contenedor.

Interfaz HTML

  • Creado al usar –web, abriendo fichero citellus.html por http se visualiza.

¿Por qué upstream?

  • Citellus es un proyecto de código abierto. Todos los plugins se envían al repositorio en github para compartirlos (es lo que queremos fomentar, reutilización del conocimiento).
  • Cada uno es experto en su área: queremos que todos contribuyan
  • Utilizamos un acercamiento similar a otros proyectos de código abierto: usamos gerrit para revisar el código y UnitTesting para validar la funcionalidad básica.

¿Cómo contribuir?

Actualmente hay una gran presencia de plugins de OpenStack, ya que es en ese área donde trabajamos diariamente, pero Citellus no está limitado a una tecnología o producto.

Por ejemplo, es fácil realizar comprobaciones acerca de si un sistema está configurado correctamente para recibir actualizaciones, comprobar versiones específicas con fallos (Meltdown/Spectre) y que no hayan sido deshabilitadas las protecciones, consumo excesivo de memoria por algún proceso, fallos de autenticación, etc.

Lea la guía del colaborador en : https://github.com/citellusorg/citellus/blob/master/CONTRIBUTING.md para más detalles.

Citellus vs otras herramientas

  • XSOS: Proporciona información de datos del sistema (ram, red, etc), pero no analiza, a los efectos es un visor ‘bonito’ de información.

  • TripleO-validations: se ejecuta solamente en sistemas ‘en vivo’, poco práctico para realizar auditorías o dar soporte.

¿Por qué no sosreports?

  • No hay elección entre una u otra, SOS recoge datos del sistema, Citellus los analiza.
  • Sosreport viene en los canales base de RHEL, Debian que hacen que esté ampliamente distribuido, pero también, dificulta el recibir actualizaciones frecuentes.
  • Muchos de los datos para diagnóstico ya están en los sosreports, falta el análisis.
  • Citellus se basa en fallos conocidos y es fácilmente extensible, necesita ciclos de desarrollo más cortos, estando más orientado a equipos de devops o de soporte.

¿Qué hay bajo el capó?

Filosofía sencilla:

  • Citellus es el ‘wrapper’ que ejecuta.
  • Permite especificar la carpeta con el sosreport
  • Busca los plugins disponibles en el sistema
  • Lanza los plugins contra cada sosreport y devuelve el estado.
  • El framework de Citellus en python permite manejo de opciones, filtrado, ejecución paralela, etc.

¿Y los plugins?

Los plugins son aún más sencillos:

  • En cualquier lenguaje que pueda ser ejecutado desde una shell.
  • Mensajes de salida a ‘stderr’ (>&2)
  • Si en bash se utilizan cadenas como $“cadena”, se puede usar el soporte incluido de i18n para traducirlos al idioma que se quiera.
  • Devuelve $RC_OKAY si el test es satisfactorio / $RC_FAILED para error / $RC_SKIPPED para los omitidos / Otro para fallos no esperados.

¿Y los plugins? (continuación)

  • Heredan variables del entorno como la carpeta raíz para el sosreport (vacía en modo Live) (CITELLUS_ROOT) o si se está ejecutando en modo live (CITELLUS_LIVE). No se necesita introducir datos vía el teclado
  • Por ejemplo los tests en ‘vivo’ pueden consultar valores en la base de datos y los basados en sosreport, limitarse a los logs existentes.

Ejemplo de script

¿Listos para profundizar en los plugins?

  • Cada plugin debe validar si debe o no ejecutarse y mostrar la salida a ‘stderr’, código de retorno.
  • Citellus ejecutará e informará de los tests en base a los filtros usados.

Requisitos:

  • El código de retorno debe ser $RC_OKAY (ok), $RC_FAILED (fallo) or $RC_SKIPPED (omitido).
  • Los mensajes impresos a stderr se muestran si el plugin falla o se omite (si se usa el modo detallado)
  • Si se ejecuta contra un ‘sosreport’, la variable CITELLUS_ROOT tiene la ruta a la carpeta del sosreport indicada.
  • CITELLUS_LIVE contiene 0 ó 1 si es una ejecución en vivo o no.

¿Cómo empezar un nuevo plugin (por ejemplo)?

  • Crea un script en ~/~/.../plugins/core/rhev/hosted-engine.sh
  • chmod +x hosted-engine.sh

¿Cómo empezar un nuevo plugin (continuación)?

¿Cómo empezar un nuevo plugin (con funciones)?

¿Cómo probar un plugin?

  • Use tox para ejecutar algunas pruebas UT (utf8, bashate, python 2.7, python 3)

  • Diga a Citellus qué plugin utilizar:

¿Qué es Magui?

Introducción

  • Citellus trabaja a nivel de sosreport individual, pero algunos problemas se manifiestan entre conjuntos de equipos (clústeres, virtualización, granjas, etc)

Por ejemplo, Galera debe comprobar el seqno entre los diversos miembros para ver cúal es el que contiene los datos más actualizados.

Qué hace M.a.g.u.i. ?

  • Ejecuta citellus contra cada sosreport o sistema, obtiene los datos y los agrupa por plugin.
  • Ejecuta sus propios plugins contra los datos obtenidos, destacando problemas que afectan al conjunto.
  • Permite obtener datos de equipos remotos via ansible-playbook.

¿Qué aspecto tiene?

Siguientes pasos con Magui?

  • Dispone de algunos plugins en este momento:
    • Agregan data de citellus ordenada por plugin para comparar rápidamente
    • Muestra los datos de ‘metadatos’ de forma separada para contrastar valores
    • pipeline-yaml, policy.json y otros (asociados a OpenStack)
    • seqno de galera
    • redhat-release entre equipos
    • Faraday: compara ficheros que deban ser iguales o distintos entre equipos

Siguientes pasos

  • Más plugins!
  • Dar a conocer la herramienta para entre todos, facilitar la resolución de problemas, detección de fallos de seguridad, configuraciones incorrectas, etc.
  • Movimiento: Muchas herramientas mueren por tener un único desarrollador trabajando en sus ratos libres, tener contribuciones es básico para cualquier proyecto.
  • Programar más tests en Magui para identificar más casos dónde los problemas aparecen a nivel de grupos de sistemas y no a nivel de sistema sindividuales.

Otros recursos

Blog posts:

¿Preguntas?

Gracias por asistir!!

Ven a #citellus en Freenode, https://t.me/citellusUG en Telegram o contacta con nosotros:

by Pablo Iranzo Gómez at May 16, 2019 05:30 PM

Aptira

Custom OpenStack Tools & Training

Aptira OpenStack Performance Icon

A large Telco leverages OpenStack components to host and manage Virtual Network Functions (VNF) for external network services, running RedHat’s OSP as their OpenStack platform. They needed additional resources, as well as additional in-house skills and experience to efficiently manage this platform.


The Challenge

One of our clients is in the process of creating a new platform. This platform consists of tools, systems and hardware which is required to host, manage and connect Virtual Network Functions (VNFs) to external networks.

OSP is a complex product and their internal team lacked expertise in running, managing and operating such a large and involved system. Also, they were under time constraints and were concerned that they may not be able to deliver the system on time.


The Aptira Solution

Aptira offered a mix of resident engineer support and project-based resources to assist with the following tasks.

  • OpenStack Tuning: We provided consulting advice on best practice methods for tuning OpenStack so it functions better in their production environment. We also helped troubleshoot some issues they were facing during the deployment of RedHat OSP.
  • OpenStack Training: We created a custom OpenStack training course and delivered an intensive training session to their operational team (about 10 people) so that the ops team can efficiently manage all operation tasks of the environment.
  • OpenStack Testing: We developed Jenkins test suites using the robot framework. These test suites validate OpenStack and its base operating system after an upgrade or change.
  • OpenStack Monitoring: They are currently using Zenoss to monitor OpenStack, but Zenoss data is not able to be used in other systems. Hence a custom program has been developed to export Zenoss data to a Hadoop-based capacity planning system. This data is then used to generate reports for capacity planning.

The Result

What started out as a short-term project to get their team up to speed with OpenStack has now turned into a long-term engagement with more projects planned to further enhance their OpenStack abilities.

With the custom OpenStack tools we created, they are now confident to complete upgrades and make required change to the system without further assistance. Their OpenStack was built on schedule without delay and is now running in production.


Learn from instructors with real world expertise.
Start training with Aptira today.

View Courses

The post Custom OpenStack Tools & Training appeared first on Aptira.

by Aptira at May 16, 2019 01:23 PM

May 15, 2019

Aptira

Creating Virtual WAN Links using OpenKilda and Noviflow

A local Communications Provider is facing problems managing Software Defined Networks in a Wide Area Network (SDN-WAN). They are facing connectivity issues transporting different traffic patterns, such as video and voice, across multiple data centres. Utilising OpenKilda and Noviflow, we’ve been able to facilitate the transfer of complex network traffic between data centres.


The Challenge

This organisation needs to transmit very different types of traffic between data centres, with each type of traffic requiring different network resources such as bandwidth. In order to efficiently transport these different data types between data centres they will require a reliable and scalable mechanism to manage this traffic across multiple Points of Presence (PoP).


The Aptira Solution

We love OpenKilda. In fact, we may have a slight crush on it. It’s a highly-scalable and open source SDN controller, specifically deployed at scale to solve network latency issues. OpenKilda has been designed to deal with real-world challenges experienced by service providers and data centres and was originally developed for TPN – Telstra’s Programmable Network.

Based on SDN protocols, OpenKilda can:

  • Program SDN switches (such as NoviFlow switches)
  • Respond in real-time to changing traffic patterns
  • Track changes in the network infrastructure

OpenKilda integrates with NoviFlow switches of SDN-WAN deployed across data centres. Using OpenKilda, it’s possible to create Virtual Local Area Networks (VLAN) virtual links across the switches for sending different traffic patterns across the different Data Centres.

By using Noviflow switches, we’re able to deliver consistent, seamless, edge-to-core SDN based centralisation and control. These switches were specifically designed for deployment in carrier networks and data centres looking to leverage the benefits of software-defined networking to improve the cost/performance, security, scalability and flexibility of networks.

The difference that OpenKilda brings to the table are as follows:

  • Complex Network Traffic: Support for complex and heterogeneous traffic patterns, e.g. the ability to categorise the packets base on VLAN ID
  • Scalability: OpenKilda can accommodate up to 10,000 switches with 16 million flows
  • Telemetry: OpenKilda provides network statistics for monitoring and managing the network
  • Close-Loop Automation: policies like self-healing is also supported by OpenKilda that can be used to recover from link or device failure
  • Path Computation Engine: this adds a great value by provisioning the customer dynamically and rerouting traffic based on the customer and network needsGraphical User Interface (GUI): provides a friendly user interface to configure and provision the network

OpenKilda is highly scalable because it can detect changes in the number of flows between sources and destinations using SDN and Traffic Engineering, and in response, it can:

  • Allocate different bandwidth to different traffic types
  • Re-route packets if one path fails
  • Send packets via the path with lower latency that might not be the shortest path.

This means we can accommodate more flows in the network and increase the scalability of the network. Traditional network methods require network engineers to configure every switch and router in the network to allow traffic flows between sources and destinations (e.g. via a WAN network between two data centres).

With the OpenKilda SDN Controller and using Noviflow SDN switches at the edge network, these manual steps are eliminated. In turn, managing and handling the network devices becomes much simpler, and that saves money. In addition, OpenKilda can increase the performance of the network by reacting to events in real-time. For example, scaling automatically as traffic patterns change. Moreover, OpenKilda’s GUI and telemetry information enable easy monitoring and performance management of the network.


The Result

By utilising OpenKilda and Noviflow, this Communications Provider can now implement services such as WAN links using VLANs that can be created and managed by OpenKilda. They can also leverage this Software Defined Network Controller (SDNC) to deploy varied services to support complex traffic patterns, facilitating the transfer of complex traffic between data centres.


Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Creating Virtual WAN Links using OpenKilda and Noviflow appeared first on Aptira.

by Aptira at May 15, 2019 06:19 AM

May 14, 2019

Aptira

Custom OpenStack Integration with Puppet

A Silicon Valley company who provides carrier-grade network services needs an OpenStack Cloud to be integrated with several of their existing systems – some of which are not commonly used.


The Challenge

Deploying an OpenStack Cloud is generally not a challenge for our Solutionauts, and this project started out no different. This organisation requires some integration with many of their existing services, including Ceph, LDAP and DNS. But in addition to these basic OpenStack components, they also required the use of DNSaaS (Designate) and Container Orchestration (Magnum) which are not so commonly used.

Another challenge we faced was that they required a fully automated deployment using their existing Puppet infrastructure, from bare-metal to ready-to-use OpenStack nodes. Everything needed to be defined as Puppet recipes, which would involve considerable work.


The Aptira Solution

Aptira developed a tailor-made Cloud solution based on OpenStack Puppet modules. We then developed custom Puppet code to meet their integration requirements as well as bridging the gap of configuring network on hosts and repos etc.

The deployment comprised of three controller nodes and a number of compute nodes, running in HA mode, with the following configurations:

  • Glance, Cinder and Nova used their existing Ceph cluster as backend
  • Keystone used their LDAP as backend
  • Designate used their external DNS server as backend

Once the Cloud had been designed and the Puppet code developed, we executed the project remotely on their in-house infrastructure for testing.


The Result

The biggest challenge we faced whilst building this solution was to develop custom puppet code and automate the whole lifecycle from empty bare-metal node to ready-to-use OpenStack with puppet whilst controlling their servers remotely via Out-Of-Band management interface. But our Solutionauts love a challenge!

With proper testing and changes to the code, we successfully overcame these challenges, delivering the final configuration, integrating OpenStack with their existing systems and passing all acceptance tests.


Become more agile.
Get a tailored solution built just for you.

Find Out More

The post Custom OpenStack Integration with Puppet appeared first on Aptira.

by Aptira at May 14, 2019 06:08 AM

May 13, 2019

OpenStack Superuser

How Verizon Media rocks bare metal

DENVER — At the first Open Infrastructure Summit, James Penick, architecture director, offered a rare glimpse into the inner workings of entertainment giant Verizon Media.

Verizon Media (formerly Oath) is a subsidiary of the telecom giant that most of us probably use every day: TechCrunch, Huffington Post, Yahoo! and Engadget. Keeping these asset-rich sites up and running requires a heavy underpinning of reliable infrastructure.

Penick describes how and why Verizon Media transitioned from an infrastructure built almost entirely on proprietary, custom platforms to one predominantly powered by open infrastructure. His keynote was one of a series that underlined the importance of bare metal, here’s more on the recently announced OpenStack Ironic Bare Metal Program.

The numbers are eye popping: hundreds of thousands of machines with millions of CPU cores on a private cloud that are mainly managed with open-source tech. Penick paints himself as something of an outsider, noting that a boss told him he was the only software architect he’d ever met who didn’t insist on building everything from scratch myself.

“I told him that I chose to use OpenStack because it exists and it has an active and passionate global community who are always working to improve their product. The value of that force multiplication can not be understated.”

Although the team wasn’t starting from nothing, Penick realized they did have to start with bare metal. “We realized that before we grab our tools and build that house, it’s got to stand on something. It needs a foundation,” he says. “Bare metal is the rebar and concrete which form the foundation of your infrastructure. VMs, Containers, and functions are all awesome, but each platform stands on the one beneath.

While it may seem like the buzz word du jour, Penick says that bare metal is “not glamorous, it’s not the latest flashy, cool, new toy, but you’ve got to deal with it.”  So his team invested there, changed the business to make infrastructure-as-a-service bare metal an option, then a default and then “make it unavoidable.” That’s how he says that what was once a heretical idea became common sense.

And what did they build on it? Virtual machines, Kubernetes, containers and much more. Our production workloads run at all levels of this stack and my dream is to one day push as much of our workloads into higher functions as much as possible.

Penick concludes by encouraging active involvement in open source.

“Don’t just consume; participate and give back. The more you put engineering, operations and architectural talent into contribute experience and perspective,” he says.  The more you help shape and guide the direction of the product to better suit your needs! I hope that our experiences moving over four million cores of infrastructure to open technologies like OpenStack Ironic will help give you the confidence to do the same.”

Check out the whole keynote below.

Photo // CC BY NC

The post How Verizon Media rocks bare metal appeared first on Superuser.

by Superuser at May 13, 2019 02:04 PM

Aptira

VMware to OpenStack Migration

A well-known video streaming organisation has been operating on traditional hardware infrastructure and needs to upgrade to a cloud-based solution. We setup an internal lab for them to test out OpenStack and help them to get up and running on faster, up-to-date technology as soon as possible.


The Challenge

Previously, the customer had been running their system (which consisted of both static web hosting and video streaming) on VMware. To begin with, they successfully moved their static web hosting services to AWS, leaving just the video streaming services on the traditional infrastructure.

Unfortunately, due to the large bandwidth costs associated with streaming, it was not feasible for them to move the video services to a public cloud platform like AWS. Also, they already have their own internal infrastructure which is perfectly capable of providing this bandwidth, without the extra costs associated with hosting it externally. They wanted to provision a private Cloud solution utilising this internal infrastructure and complete a VMware to OpenStack migration, but weren’t sure where to begin. So they asked for our advice.


The Aptira Solution

OpenStack is a popular private Cloud solution and its easy to see why. With less resources required to operate the solution, users are provided with greater flexibility and reduced operational costs. Often, the OpenStack migration can be the most daunting part, but luckily there are several tools on the market to help simplify this process.

First of all, we designed and deployed Mirantis OpenStack directly onto their existing hardware in a lab environment using Mirantis Fuel. Fuel makes it easy to deploy and manage a variety of OpenStack distributions, accelerating the time-consuming, complex and error-prone process of deploying and running OpenStack at scale.

We then used GEMINI – our custom-built OpenStack migration engine – to migrate their SUSE Linux Enterprise Server (SLES) and Red Hat Enterprise Linux (RHEL) virtual machines from VMware to OpenStack. GEMINI has been specifically designed to automatically move workloads and VMs between platforms – addressing the various difficulties that both operators and administrators face when migrating existing workloads onto OpenStack. Also, it integrates completely with OpenStack projects, minimising deployment and maintenance efforts.


The Result

The client now has a fully functioning OpenStack lab environment to test their video streaming services. They have successfully evaluated OpenStack and decided that they would like to move their production video streaming services to their new in-house platform.

The video services will no longer be streaming externally – resulting in a significant reduction in operational costs, whilst also allowing them to take advantage of the new flexibility that their in-house private OpenStack cloud solution provides.


How can we make OpenStack work for you?
Find out what else we can do with OpenStack.

Find Out Here

The post VMware to OpenStack Migration appeared first on Aptira.

by Aptira at May 13, 2019 01:37 PM

May 11, 2019

Colleen Murphy

Denver III: Revenge of the Snowpenstack

YET AGAIN we returned to Denver, the city that brings snow and trains together. This time we were here not just for the Project Teams Gathering but also for the OpenStack Summit and Forum. Although the time allotment for all the activity this week was very compacted and our brains …

by Colleen Murphy at May 11, 2019 12:00 AM

May 10, 2019

Ghanshyam Mann

Open Infrastructure Summit: QA Summary for Summit & PTG

Open Infrastructure Summit followed by OpenStack PTG held in Denver, USA on 29th April 2019 till 4th May 2019.

We had good discussions at QA forum sessions in Summit and PTG. I am summarizing the QA related discussions.

    Summit: QA Forum sessions:

    1. OpenStack QA – Project Update: Tuesday, April 30, 2:35pm-2:55pm

We gave the updates on what we finished on Stein and draft plan for Train cycle.
The good thing to note is we still have a lot of activity going on in QA.
As overall QA projects, we did >3000 reviews and 750 commits. Video is not up still so I am copying the slide link below. Slides: https://docs.google.com/presentation/d/10zupeFZuOlxroAMl29qVJl78nD4_YWHkQxANNVlIjE0/edit?ts=5cc73ae8#slide=id.p1

    2. OpenStack QA – Project Onboarding : Wednesday, May 1, 9:00am-9:40am

We did host the QA onboarding sessions but there were only 3 attendees and no new contributor. I think it is hard to see any new contributor in summits now so I am thinking whether we should host the onboarding sessions from next time.

Etherpad: https://etherpad.openstack.org/p/DEN-qa-onboarding

    3. Users / Operators adoption of QA tools / plugins : Wednesday, May 1, 10:50am-11:30am

As usual, we had more attendees in this session and useful feedback. There are few tooling is being shared by attendees:

1. Python hardware module for bare metal detailed hardware inspection & anomaly detection https://github.com/redhat-cip/hardware

2. Workload testing: https://opendev.org/x/tobiko/ Another good idea from Doug was plugin feature in openstack-health dashboard.

That is something we discussed in PTG. For more details on this, refer the PTG ” OpenStack-health improvement” section.
Etherpad: https://etherpad.openstack.org/p/Den-forum-qa-ops-user-feedback

    QA PTG: 2nd – 3rd May:

We were 3-4 attendee in the room always and others attended per topics. Good discussions and few good improvement ideas about gate stability and dashboard etc.

    1. Topic: Stein Retrospective:

We collect good and need improvement things in this session. In term of good things, we completed the OpenStack gate migration fro Xenial to Bionic, lot of reviews and code. Doug from AT&T mentioned about to add tempest and patrole to gates and check in their production deployment process, “Thank you for all of the hard work from the QA team!!!” Slow reviews are a concern as we have a good number of the incoming request. This is something we should improve in Train.

Action items:

gmann: start the plan for backlogs especially for review and doc cleanup.

masayukig: plan to have resource leakage check in gate.

ds6901:will work with his team to clean up leaks and submit bugs

    2. Topic: Keystone system-scope testing:

QA and Keystone team gathered together in this cross-project session about next steps on system scope testing. We talked on multiple points about how to cover all new roles for system scope and how to keep the backward compatibility testing for stable branches still testing the without system scope. We decided to move forward for system_admin as of now and fall back the system_admin to project scope if there is no system_scope testing flag is true on Tempest side (this will cover the stable branch testing unaffected).

We agreed :

– To move forward with system admin – https://review.opendev.org/#/c/604909/

– Add tempest job to test system scope – https://review.opendev.org/#/c/614484/

– Then add to tempest full – gmann – Then add testing for system reader

– Investigate more advanced RBAC testing with Patrole – gmann

Etherpad: https://etherpad.openstack.org/p/keystone-train-ptg-testing-system-scope-in-tempest

    3. Topic: new whitebox plugin for tempest:

This is a new idea from artom about testing things outside of Tempest’s scope (currently mostly used to check instance XML for NFV use case tests). Currently, this tool does ssh into VM and fetch the xml for further verification etc. We agreed on point to avoid any duplicate test verification from the Tempest or nova functional tests This is good to tests from more extra verification by going inside VM like after migration data, CPU pinning etc. As next step artom to propose the QA spec with details and proposal of this plugin under QA program.

    4. Topic: Document the QA process or TODO things for releases, stable branch cut:

Idea is to start a centralized doc page for QA activities and process etc. we want to use the qa-specs repo to publish the content to doc.openstack.org/qa/. This can be not so easy and need few tweaks on doc jobs. I will get into the details and then discuss with infra team. This is a low priority for now.

    5. Topic: Plugin sanity check:

Current tempest-plugins-sanity job is not stable and so it is n-v. We want to make it voting by only installing the active plugins. many plugins are failing either they are dead or not so active. We agreed on: – make faulty plugins as blacklist with bug/patch link and notify the same on ML every time we detect any failure – Publish the blakclist on plugins-registry doc. – After that make this job voting, make the process of fixing and removing the faulty plugin which unblocks the tempest gate with author self-approve. – Make sanity job running on plugins which are dependent on each other. For example, congress-tempest-plugin use neutron-tempest-plugin, mistral-tempest-plugin etc so all these plugins should have a sanity job which can install and list these plugins tests only not all the plugins.

    6. Topic: Planning for Patrole Stable release:

We had a good amount of discussions for Patrole improvements area to release it stable. Refer the below ML thread for details and further discussions on this topic: – http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005870.html

    7. Topic: How to make tempest-full stable:

Current integrated-gate jobs (tempest-full) is not so stable for various bugs specially timeout. We discussed the few ideas to improve it. Refer the below ML thread for details and further discussions on this topic : http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005871.html

 

    8. Topic: OpenStack-Health Improvement:

Doug from AT&T has few improvement ideas for health dashboard which has been discussed in PTG:

– Test Grouping – Define groups

– Assigned test to groups – filter by groups

– Compare 2 runs = Look into push AQuA report to subunit2SQL as a tool

Action Items:

– Doug is going to write the spec for plugin approach. All other ideas can be done after we have the plugin approach ready.

– filter

– presentation

    9. Topic: Stein Backlogs & Train priorities & Planning:

We collected the Train items in below mentioned etherpad with the assignee. Anyone would like to help on any of the item, ping me on IRC or reply here.

Etherpad: https://etherpad.openstack.org/p/qa-train-priority

    10. Topic: grenade zuulv3 jobs review/discussions:

We did not get the chance to review these. Let’s continue it after PTG.

    Full Detail discussion: https://etherpad.openstack.org/p/qa-train-ptg

        

by Ghanshyam Mann at May 10, 2019 05:37 PM

Aptira

F5 NFV Automation

A major Telco needed to determine the feasibility of implementing end-to-end lifecycle management for their F5 Virtual Network Functions (VNF). So we spun up a lab environment and completed the evaluation process for them.


The Challenge

The telco’s IT team is relatively new and did not have the resources available to perform this validation. They had no internal lab infrastructure, no F5 technical knowledge and no staff able to perform the evaluation. Nonetheless, it was critical to the organisation’s business plan to show that this VNF could be lifecycle managed through the following use cases:

  • Onboarding
  • Building test configurations
  • Auto-healing
  • Auto-scaling
  • Removing test configurations

In order for the proposed solution to be accepted, we needed to verify that this solution would work not only as per the vendor claims, but also as per the client’s specific market requirements. Unfortunately, the client did not have a clear definition of a completed outcome for this evaluation, so there were no hard and fast success metrics. Rather, the customer wanted to approach this in an organic way, providing feedback as the project progressed and ongoing deliverables for us to achieve.


The Aptira Solution

This assignment required not only broad and deep technical knowledge, but also the ability to think on the fly as problems arose or as technical requirements are clarified. Aptira was able to allocate an internal team and spin up a dedicated lab environment to perform this evaluation, thus resolving the customer’s resource constraints. This team was spread between Australia (Sydney and Melbourne) and Taiwan.

Due to Aptira’s status as the Cloudify distributor for the Asia-Pacific area, and a deep base of knowledge on the Cloudify Service Orchestration platform, Aptira was able to leverage Cloudify’s framework for this solution.

In order to speed up the process, our team was able to assign tasks amongst themselves to work in parallel:

  • Lab access, environment details
  • VIM configuration and core software installation
  • Orchestration policy development and testing
  • F5 Technical ramp-up and liaison with F5 designated Technical consultant

We worked in consultation with the customer to prepare an architecture design that would not only demonstrate the capabilities of Cloudify but also provide them the required direction for designing their Network Functions Virtualisation (NFV) platform for hosting Telco workloads. Following is high level diagram of this architecture utilising Cloudify’s NFV Orchestrator and F5’s NFV Infrastructure:

The work was performed entirely using Aptira’s resources. Most development had previously been completed on the developer’s own infrastructure and integrated into Aptira’s own lab environment for testing and ultimately to demonstrate the results to the customer.

We found that the availability of an F5 technical consultant was a mandatory requirement, primarily due to the complexity of the NFV implementation. The implementation requires both a licensing server and a Virtual Network Functions Manager (VNFM) server to operate and the issue license keys by F5 – all the evaluation keys had expiry dates and the system would not function without the keys. The VNFM for the F5 loadbalancer/firewall was also quite complex and not intuitive, so the availability of a deeply knowledgeable F5 consultant minimised delays.

Jira and Confluence has been used for feature/task tracking and documentation respectively. The team also used both Slack and email for online collaboration, ensuring fast communication and input into the ongoing deliverables.


The Result

Aptira completed the development of the use cases for the F5 loadbalancer/firewall exactly on schedule, under budget, with all required functionality and with no technical issues in any component of the stack.

It is worth highlighting that as we mentioned earlier in this article, the F5 technical consultant should be seen as mandatory in F5 projects in order to enable rapid turn-around (and even anticipation) of questions and issues.


Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post F5 NFV Automation appeared first on Aptira.

by Aptira at May 10, 2019 01:56 PM

May 09, 2019

StackHPC Team Blog

I/O performance of Kata containers

Kata project logo

This analysis was performed using Kata containers version 1.6.2, the latest at the time of writing.

After attending a Kata Containers workshop at OpenInfra Days 2019 in London, we were impressed by their start-up time, only marginally slower compared to ordinary runC containers in a Kubernetes cluster. We were naturally curious about their disk I/O bound performance and whether they also live up to the speed claims. In this article we explore this subject with a view to understanding the trade offs of using this technology in environments where I/O bound performance and security are both critical requirements.

What are Kata containers?

Kata containers are lightweight VMs designed to integrate seamlessly with container orchestration software like Docker and Kubernetes. One envisaged use case is running untrusted workloads, exploiting the additional isolation gained by not sharing the Operating System kernel with the host. However, the unquestioning assumption that using a guest kernel leads to additional security is challenged in a recent survey of virtual machines and containers. Kata has roots in Intel Clear Containers and Hyper runV technology. They are also often mentioned alongside gVisor, which aims to solve a similar problem by filtering and redirecting system calls to a separate user space kernel. As a result gVisor suffers from runtime performance penalties. Further discussion on gVisor is out of scope in this blog.

Configuring Kubernetes for Kata

Kata containers are OCI conformant which means that a Container Runtime Interface (CRI) that supports external runtime classes can use Kata to run workloads. Examples of these CRIs currently include CRI-O and containerd which both use runC by default, but this can be swapped for the kata-qemu runtime. From Kubernetes 1.14+ onwards, the RuntimeClass feature flag has been promoted to beta, therefore enabled by default. Consequently the setup is relatively straightforward.

At present Kata supports qemu and firecracker hypervisor backends, but the support for the latter is considered preliminary, especially a lack of host to guest file sharing. This leaves us with kata-qemu as the current option, in which virtio-9p provides the basic shared filesystem functionalities critical for this analysis (the test path is a network filesystem mounted on the host).

This example Gist shows how to swap runC for Kata runtime in a Minikube cluster. Note that at the time of writing, Kata containers have additional host requirements:

Without these prerequisites Kata startup will fail silently (we learnt this the hard way).

For this analysis a baremetal Kubernetes cluster was deployed, using OpenStack Heat to provision the machines via our appliances playbooks and Kubespray to configure them as a Kubernetes cluster. Kubespray supports specification of container runtimes other than Docker, e.g. CRI-O and containerd, which is required to support the Kata runtime.

Designing the I/O Performance Study

To benchmark the I/O performance Kata containers, we present equivalent scenarios in bare metal and runC container cases to draw comparison. In all cases, we use fio (version 3.1) as the I/O benchmarking tool invoked as follows where $SCRATCH_DIR is the path to our BeeGFS (described in more detail later in this section) network storage mounted on the host:

fio fio_jobfile.fio --fallocate=none --runtime=30 --directory=$SCRATCH_DIR --output-format=json+ --blocksize=65536 --output=65536.json

The fio_jobfile.fio file referenced above reads as follows:

[global]
; Parameters common to all test environments

; Ensure that jobs run for a specified time limit, not I/O quantity
time_based=1

; To model application load at greater scale, each test client will maintain
; a number of concurrent I/Os.
ioengine=libaio
iodepth=8

; Note: these two settings are mutually exclusive
; (and may not apply for Windows test clients)
direct=1
buffered=0

; Set a number of workers on this client
thread=0
numjobs=4
group_reporting=1

; Each file for each job thread is this size
filesize=32g
size=32g
filename_format=$jobnum.dat

[fio-job]
; FIO_RW is read, write, randread or randwrite
rw=${FIO_RW}

In order to understand how the performance scales with the number of I/O bound clients, we look at 1, 8 and 64 clients. While the single client is instantiated on a single instance, for the cases with 8 and 64 clients, they run in parallel across across 2 worker instances, with 4 and 32 clients per bare metal instance respectively. Additionally, each fio client instantiates 4 threads which randomly and sequentially read and write a 32G file per thread, depending on the scenario.

All scenarios are configured with a block size of 64K. It is worth noting that the direct=true flag has not been supplied to fio for these tests as it is not representative of a typical use case.

The test infrastructure is set up in an optimal configuration for data-intensive analytics. The storage backend which consists of NVMe devices is implemented with BeeGFS, a parallel file system for which we have an Ansible Galaxy role and have previously written about. The network connectivity between the test instances and BeeGFS storage platform uses RDMA over a 100G Infiniband fabric.

Scenario Number of clients Disk I/O pattern
bare metal 1 sequential read
runC containers 8 random read
Kata containers 64 sequential write
    random write
The parameter space explored for the I/O performance study covers 36 combinations of scenarios, number of clients and disk I/O pattern.

Results

Disk I/O Bandwidth

In these results we plot the aggregate bandwidth across all clients, demonstrating the scale-up bandwidth achievable by a single client and the scale-out throughput achieved across many clients.

Comparison of disk I/O bandwidth

Comparison of disk I/O bandwidth between between bare metal, runC and Kata. In all cases, the bandwidth achieved with runC containers is slightly below bare metal. However, Kata containers generally fare much worse, achieving around 15% of the bare metal read bandwidth and a much smaller proportion of random write bandwidth when there are 64 clients. The only exception is the sequential write case using 64 clients, where Kata containers appear to outperform baremetal scenario by approximately 25%.

Commit Latency Cumulative Distribution Function (CDF)

In latency-sensitive workloads, I/O latency can dominate. I/O operation commit latency is plotted on a logarithmic scale, to fit a very broad range of data points.

Comparison of commit latency CDF

Comparison of commit latency CDF between bare metal, runC and Kata container environments for 1, 8 and 64 clients respectively. There is a small discrepancy between running fio jobs in bare metal compared to running them as runC containers. However, comparing bare metal to Kata containers, the overhead is significant in all cases.

Number of clients > 1 8 64
Mode Scenario 50% 99% 50% 99% 50% 99%
sequential read bare 1581 2670 2416 3378 14532 47095
runC 2007 2506 2391 3907 15062 46022
Kata 4112 4620 12648 46464 86409 563806
random read bare 970 2342 2580 3305 14935 43884
runC 1155 2277 2506 3856 15378 42229
Kata 5472 6586 13517 31080 109805 314277
sequential write bare 1011 1728 2592 15023 3730 258834
runC 1011 1990 2547 14892 4308 233832
Kata 3948 4882 4102 6160 14821 190742
random write bare 1269 2023 3698 11616 19722 159285
runC 1286 1957 3928 11796 19374 151756
Kata 4358 5275 4566 14254 1780559 15343845
Table summarising the 50% and the 99% commit latencies (in μs) corresponding to the figure shown earlier.

Looking Ahead

In an I/O intensive scenario such as this one, Kata containers do not yet match the performance of conventional containers.

It is clear from the results that there are significant trade offs to consider when choosing between bare metal, runC and Kata containers. While runC containers provide valuable abstractions for most use cases, they still leave the host kernel vulnerable to exploit with the system call interface as attack surface. Kata containers provide hardware-supported isolation but currently there is significant performance overhead, especially for disk I/O bound operations.

Kata's development roadmap and pace of evolution provide substantial grounds for optimism. The Kata team are aware of the performance drawbacks of using virtio-9p as the storage driver for sharing paths between host and guest VMs.

Kata version 1.7 (due on 15 May 2019) is expected to ship with experimental support for virtio-fs which is expected to improve I/O performance issues. Preliminary results look encouraging, with other published benchmarks reporting the virtio-fs driver demonstrating 2x to 8x disk I/O bandwidth improvement over virtio-9p. We will repeat our analysis when the new capabilities become available.

In the meantime, if you would like to get in touch we would love to hear from you, especifically if there is a specific configuration which we may not have considered. Reach out to us on Twitter or directly via our contact page.

by Bharat Kunwar at May 09, 2019 03:00 PM

I/O performance of Kata containers

Kata project logo

This analysis was performed using Kata containers version 1.6.2, the latest at the time of writing.

After attending a Kata Containers workshop at OpenInfra Days 2019 in London, we were impressed by their start-up time, only marginally slower compared to ordinary runC containers in a Kubernetes cluster. We were naturally curious about their disk I/O bound performance and whether they also live up to the speed claims. In this article we explore this subject with a view to understanding the trade offs of using this technology in environments where I/O bound performance and security are both critical requirements.

What are Kata containers?

Kata containers are lightweight VMs designed to integrate seamlessly with container orchestration software like Docker and Kubernetes. One envisaged use case is running untrusted workloads, exploiting the additional isolation gained by not sharing the Operating System kernel with the host. However, the unquestioning assumption that using a guest kernel leads to additional security is challenged in a recent survey of virtual machines and containers. Kata has roots in Intel Clear Containers and Hyper runV technology. They are also often mentioned alongside gVisor, which aims to solve a similar problem by filtering and redirecting system calls to a separate user space kernel. As a result gVisor suffers from runtime performance penalties. Further discussion on gVisor is out of scope in this blog.

Configuring Kubernetes for Kata

Kata containers are OCI conformant which means that a Container Runtime Interface (CRI) that supports external runtime classes can use Kata to run workloads. Examples of these CRIs currently include CRI-O and containerd which both use runC by default, but this can be swapped for the kata-qemu runtime. From Kubernetes 1.14+ onwards, the RuntimeClass feature flag has been promoted to beta, therefore enabled by default. Consequently the setup is relatively straightforward.

At present Kata supports qemu and firecracker hypervisor backends, but the support for the latter is considered preliminary, especially a lack of host to guest file sharing. This leaves us with kata-qemu as the current option, in which virtio-9p provides the basic shared filesystem functionalities critical for this analysis (the test path is a network filesystem mounted on the host).

This example Gist shows how to swap runC for Kata runtime in a Minikube cluster. Note that at the time of writing, Kata containers have additional host requirements:

Without these prerequisites Kata startup will fail silently (we learnt this the hard way).

For this analysis a baremetal Kubernetes cluster was deployed, using OpenStack Heat to provision the machines via our appliances playbooks and Kubespray to configure them as a Kubernetes cluster. Kubespray supports specification of container runtimes other than Docker, e.g. CRI-O and containerd, which is required to support the Kata runtime.

Designing the I/O Performance Study

To benchmark the I/O performance Kata containers, we present equivalent scenarios in bare metal and runC container cases to draw comparison. In all cases, we use fio (version 3.1) as the I/O benchmarking tool.

In order to understand how the performance scales with the number of I/O bound clients, we look at 1, 8 and 64 clients. While the single client is instantiated on a single instance, for the cases with 8 and 64 clients, they run in parallel across across 2 worker instances, with 4 and 32 clients per bare metal instance respectively. Additionally, each fio client instantiates 4 threads which randomly and sequentially read and write a 32G file per thread, depending on the scenario.

All scenarios are configured with a block size of 64K. It is worth noting that the direct=true flag has not been supplied to fio for these tests as it is not representative of a typical use case.

The test infrastructure is set up in an optimal configuration for data-intensive analytics. The storage backend which consists of NVMe devices is implemented with BeeGFS, a parallel file system for which we have an Ansible Galaxy role and have previously written about. The network connectivity between the test instances and BeeGFS storage platform uses RDMA over a 100G Infiniband fabric.

Scenario Number of clients Disk I/O pattern
bare metal 1 sequential read
runC containers 8 random read
Kata containers 64 sequential write
    random write
The parameter space explored for the I/O performance study covers 36 combinations of scenarios, number of clients and disk I/O pattern.

Results

Disk I/O Bandwidth

In these results we plot the aggregate bandwidth across all clients, demonstrating the scale-up bandwidth achievable by a single client and the scale-out throughput achieved across many clients.

Comparison of disk I/O bandwidth

Comparison of disk I/O bandwidth between between bare metal, runC and Kata. In all cases, the bandwidth achieved with runC containers is slightly below bare metal. However, Kata containers generally fare much worse, achieving around 15% of the bare metal read bandwidth and a much smaller proportion of random write bandwidth when there are 64 clients. The only exception is the sequential write case using 64 clients, where Kata containers appear to outperform baremetal scenario by approximately 25%.

Commit Latency Cumulative Distribution Function (CDF)

In latency-sensitive workloads, I/O latency can dominate. I/O operation commit latency is plotted on a logarithmic scale, to fit a very broad range of data points.

Comparison of commit latency CDF

Comparison of commit latency CDF between bare metal, runC and Kata container environments for 1, 8 and 64 clients respectively. There is a small discrepancy between running fio jobs in bare metal compared to running them as runC containers. However, comparing bare metal to Kata containers, the overhead is significant in all cases.

Number of clients > 1 8 64
Mode Scenario 50% 99% 50% 99% 50% 99%
sequential read bare 1581 2670 2416 3378 14532 47095
runC 2007 2506 2391 3907 15062 46022
Kata 4112 4620 12648 46464 86409 563806
random read bare 970 2342 2580 3305 14935 43884
runC 1155 2277 2506 3856 15378 42229
Kata 5472 6586 13517 31080 109805 314277
sequential write bare 1011 1728 2592 15023 3730 258834
runC 1011 1990 2547 14892 4308 233832
Kata 3948 4882 4102 6160 14821 190742
random write bare 1269 2023 3698 11616 19722 159285
runC 1286 1957 3928 11796 19374 151756
Kata 4358 5275 4566 14254 1780559 15343845
Table summarising the 50% and the 99% commit latencies (in μs) corresponding to the figure shown earlier.

Looking Ahead

In an I/O intensive scenario such as this one, Kata containers do not yet match the performance of conventional containers.

It is clear from the results that there are significant trade offs to consider when choosing between bare metal, runC and Kata containers. While runC containers provide valuable abstractions for most use cases, they still leave the host kernel vulnerable to exploit with the system call interface as attack surface. Kata containers provide hardware-supported isolation but currently there is significant performance overhead, especially for disk I/O bound operations.

Kata's development roadmap and pace of evolution provide substantial grounds for optimism. The Kata team are aware of the performance drawbacks of using virtio-9p as the storage driver for sharing paths between host and guest VMs.

Kata version 1.7 (due on 15 May 2019) is expected to ship with experimental support for virtio-fs which is expected to improve I/O performance issues. Preliminary results look encouraging, with other published benchmarks reporting the virtio-fs driver demonstrating 2x to 8x disk I/O bandwidth improvement over virtio-9p. We will repeat our analysis when the new capabilities become available.

In the meantime, if you would like to get in touch we would love to hear from you, especifically if there is a specific configuration which we may not have considered. Reach out to us on Twitter or directly via our contact page.

by Bharat Kunwar at May 09, 2019 03:00 PM

Thomas Goirand

OpenStack-cluster-installer in Buster

I’ve been working on this for more than a year, and finally, I am acheiving my goal. I wrote a OpenStack cluster installer that is fully in Debian, and running in production for Infomaniak.

Note: I originally wrote this blog post a few weeks ago, though it was pending validation from my company (to make sure I wouldn’t disclose company business information).

What is it?

As per the package description and the package name, OCI (OpenStack Cluster Installer) is a software to provision an OpenStack cluster automatically, with a “push button” interface. The OCI package depends on a DHCP server, a PXE (tftp-hpa) boot server, a web server, and a puppet-master.

Once computers in the cluster boot for the first time over network (PXE boot), a Debian live system squashfs image is served by OCI (served by Apache), to act as a discovery image. This live system then reports the hardware features of the booted machine back to OCI (CPU, memory, HDDs, network interfaces, etc.). The computers can then be installed with Debian from that live system. During this process, a puppet-agent is configured so that it will connect to the puppet-master of OCI. Uppong first boot, OpenStack services are then installed and configured, depending on the server role in the cluster.

OCI is fully packaged in Debian, including all of the Puppet modules and so on. So just doing “apt-get install openstack-cluster-installer” is enough to bring absolutely all dependencies, and no other artifact are needed. This is very important so one only needs a local Debian mirror to install an OpenStack cluster. No external components must be downloaded from internet.

OCI setting-up a Swift cluster

At the begining of OCI’s life, we first used it at Infomaniak (my employer) to setup a Swift cluster. Swift is the object server of OpenStack. It is perfect solution for a (very) large backup system.

Think of a massive highly available cluster, with a capacity reaching peta bytes, storing millions of objects/files 3 times (for redundancy). Swift can virtually scale to infinity as long as you size your ring correctly.

The Infomaniak setup is also redundant at the data center level, as our cluster spans over 2 data centers, with at least one copy everything stored on each data center (the location of the 3rd copy depends on many things, and explaining it is not in the scope of this post).

If one wishes to use swift, it’s ok to start with 7 machines to begin with: 3 machines for the controller (holding the Keystone authentication, and a bit more), at least 1 swift-proxy machine, and 3 storage nodes. Though for redundancy purpose, it is IMO not good enough to start with only 3 storage node: if one fails, the proxy server will fall into timeouts waiting for the 3rd storage node. So 6 storage nodes feels like a better minimum. Though it doesn’t have to be top-noch servers, a cluster made of refurbished old hardware with only a few disks can do it, if you don’t need to store too much data.

Setting-up an OpenStack compute cluster

Though swift was the first thing OCI did for us, it now can do a way more than just Swift. Indeed, it can also setup a full OpenStack cluster with Nova (compute), Neutron (networking) and Cinder (network block devices). We also started using all of that, setup by OCI, at Infomaniak. Here’s the list services currently supported:

  • Keystone (identity)
  • Heat (orchestration)
  • Aodh (alarming)
  • Barbican (key/secret manager)
  • Nova (compute)
  • Glance (VM images)
  • Swift (object store)
  • Panko (event)
  • Ceilometer (resource monitoring)
  • Neutron (networking)
  • Cinder (network block device)

On the backend, OCI can use LVM or Ceph for Cinder, local storage or Ceph for Nova instances.

Full HA redundancy

The nice thing is, absolutely every component setup by OCI is done in a high availability way. Each machine of the control plane of OpenStack is setup with an instance of the components: all OpenStack controller components, a MariaDB server part of the Galera cluster, etc.

HAProxy is also setup on all controllers, in front of all of the REST API servers of OpenStack. And finally, the web address where final clients will connect is in fact a virtual IP, that can move from one server to another, thanks to corosync. Routing to that VIP can be done either over L2 (ie: a static address on a local network), or over BGP (useful if you need multi-datacenter redundancy). So if one of the controllers is down, it’s not such a big deal, HAproxy will detect this within seconds, and if it was the server that had the virtual IP (matching the API endpoint), then this IP will move to one of the other servers.

Full SSL transport

One of the things that OCI does when installing Debian, is setup a PKI (ie: SSL certificates signed by a local root CA) so that everything in the cluster is transported over SSL. Haproxy, of course does the SSL, but it also connects to the different API servers over SSL too. All connections to the RabbitMQ servers are also performed SSL. If one wishes, it’s possible to replace the self-signed SSL certificates before the cluster is deployed, so that the OpenStack API endpoint can be exposed on a public address.

OCI as a quite modular system

If one decides to use Ceph for storage, then for every compute node of the cluster, it is possible to choose to use either Ceph for the storage of /var/lib/nova/instance, or use local storage. On the later case, then of course, using RAID is strongly advised, to avoid any possible loss of data. It is possible to mix both types of compute node storage in a single cluster, and create server aggregates so it is later possible to decide which type of compute server to run the workload on.

If a cluster Ceph is part of the cluster, then on every compute node, the cinder-volume and cider-backup services will be provisioned. They will be in use to control the Cinder volumes of the Ceph cluster. Even though the network block storage itself will not run on the compute machines, it makes sense to do that. The idea is that the amount of these process needs to scale at the same time as the amount of compute nodes, so it makes sense to do that. Also, on compute servers, the Ceph secret is already setup using libvirt, so it was also convenient to re-use this.

As for Glance, if you have Ceph, it will use it as backend. If not, it will use Swift. And if you don’t have a Swift cluster, it will fall-back to the normal file backend, with a simple rsync from the first controller to the others. On such a setup, then only the first controller is used for glance-api. The other controllers also run glance-api, but haproxy doesn’t use them, as we really want the images to be stored on the first controller, so they can be rsync to the others. In practice, it’s not such a big deal, because the images are anyway in the cache of the compute servers when in use.

If one setup cinder volume nodes, then cinder-volume and cinder-backup will be installed there, and the system will automatically know that there’s cinder with LVM backend. Both Cinder over LVM and over Ceph can be setup on the same cluster (I never really tried this, though I don’t see why it wouldn’t work, normally, simply both backend will be available).

OCI in Buster vs current development

Lots of new features are being added to OCI. These, unfortunately, wont make it to Buster. Though the Buster release has just enough to be able to provision a working OpenStack cluster.

Future features

What I envision for OCI, is to make it able to provision a cluster ready for serving as a public cloud. This means having all of the resource accounting setup, as well as cloudkitty (which is OpenStack resource rating engine). I’ve already played a bit with this, and it should be out fast. Then the only missing bit to go public will be billing of the rated resources, which obviously, has to be done in-house, and doesn’t need to live within the OpenStack cluster itself.

The other things I am planning to do, is add more and more services. Currently, even though OCI can setup a fully working OpenStack, it is still a basic one. I do want to add advanced features like Octavia (load balancer as a service), Magnum (kubernets cluster as a service), Designate (DNS), Manila (shared filesystems) and much more if possible. The number of available projects is really big, so it probably will keep me busy for a very long time.

At this point, what OCI misses as well, is a custom ISO debian installer image that would include absolutely all. It shouldn’t be hard to write, though I lack the basic knowledge on how to do this. Maybe I will work on this at this summer’s DebConf. At the end, it could be a debian pure blend (ie: a fully integrated distro-in-the-distro system, just like debian-edu or debian-meds). It’d be nice if this ISO image could include all of the packages for the cluster, so that no external resources would be needed. The setting-up an OpenStack cluster with no internet connectivity at all would become possible. Because in fact, only the API endpoint on the port 443, and the virtual machines need internet access, your management network shouldn’t be connected (it’s much safer this way).

No, there wasn’t 80 engineers that burned-out in the process of implementing OCI

One thing that makes me proud, is that I wrote all of my OpenStack installer nearly alone (truth: leveraging all the work of puppet-openstack, it woudn’t have been possible without it…). That’s unique in the (small) OpenStack world. Companies like my previous employer, or a famous companies working on RPM based distros, this kind of product is the work of dozens of engineers. I heard that Red Hat has nearly 100 employees working on TripleO. This was possible because I tried to keep OCI in the spirit of “keep it simple stupid”. It is doing only what’s needed, and implemented the mot simple way possible, so that it is easy to maintain.

For example, the hardware discovery agent is made of 63 lines of ISO shell script (that is: not even bash… but dash), while I’ve seen others using really over engineered stuff, like heavy ruby or Python modules. Ironic-inspector, for example, in the Rocky release, is made of 98 files, for a total of 17974 lines. I really wonder what they are doing with all of this (I didn’t dare to look). There is one thing I’m sure: what I did is really enough for OCI’s needs, and I don’t want to run a 250+ MB initrd as the discovery system: OCI’s live build based discovery image loaded over the web rather than PXE is a way smarter.

On the same spirit, the part that does the bare-metal provisioning, is the same shell script that I wrote to create the official Debian OpenStack images. It was about 700 lines of shell script to install Debian on a .qcow2 image, it’s not about 1500 lines, and made of a single file. That’s the smallest footprint you’ll ever find. However, it does all what’s needed, still, and probably even more.

In comparison, in Fuel, there was a super-complicated scheduler, written in Ruby, used to be able to provision a full cluster by only a single click of a button. There’s no such thing in OCI, because I do believe that’s a useless gadget. With OCI, a user simply needs to remember the order for setting-up a cluster: Cephmon nodes needs to be setup first, then CephOSD nodes, then controllers, then finally, in no particular order, the computes, swiftproxy, swiftstore and volume nodes last. That’s really not a big deal to let this done by the final user, as it is not expected that one will setup multiple OpenStack every day. And even so, if you use the “ocicli” tool, it shouldn’t be hard to do these final bits of the automation. But I would consider this a useless gadget.

While every company jumped into the micro-service in container thing, even at this time, I continue to believe this is useless, and mostly driven by the needs marketing people that needs to sell features. Running OpenStack directly on bare metal is already hard, and the amount of complexity added by running OpenStack services in Docker is useless: it doesn’t bring any feature. I’ve been told that it makes upgrades easier, I very much doubt it: upgrades are complex for other reasons than just upgrading the running services themselves. Rather, they are complex because one needs to upgrade the cluster components with a given order, and scheduling this isn’t easy.

So this is how I managed to write an OpenStack installer alone, in less than a year, without compromising on features: because I wrote things simply, and avoided the over-engineering I saw at all levels on other products.

OpenStack Stein is comming

I’ve just pushed to Debian Experimental, and to https://buster-stein.debian.net/debian the last release of OpenStack (code name: Stein), which was released upstream on the 10th or April (yesterday, as I write these lines). I’ve been able to install Stein on top of Debian Buster, and I could start VMs on it: it’s all working as expected after a bit of changes in the puppet manifests of OCI. What’s needed now, is testing upgrades from Stretch + Rocky to Buster + Stein. Normally, puppet-openstack can do that. Let’s see…

Want to know more?

Read on… the README.md is on https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer

Last words, last thanks

This concludes a bit more than a year of development. All of this wouldn’t have been possible without my employer, Infomaniak, giving me a total freedom on the way I implement things for going into production. So a big thanks to them, and also for being a platinium sponsor for this year’s Debconf in Brazil.

Also a big thanks to the whole of the OpenStack project, including (but not limited to) the Infra team and the puppet-openstack team.

by Goirand Thomas at May 09, 2019 02:53 PM

Aptira

LDAP & Puppet

Aptira Puppet Logo

This organisation needed to build a lightweight directory access protocol (LDAP) farm to provide authentication services for thousands of users. The directory required a log in function into various ITS-managed systems, including high performance computing (HPC) clusters and clouds, and to utilise technologies from their existing toolkit only – namely Puppet.


The Challenge

Not only does this new system need to support a large number of users, it also has to be secure, reliable and updatable; once built, it will be completely replacing their existing LDAP infrastructure. In order to properly align this new system with their existing configuration management system, a configuration management tool is needed to deploy and manage the entire system. It was also a request of the customer that no new technologies were to be introduced into their existing system that weren’t being used elsewhere.


The Aptira Solution

The customer’s existing DevOps toolkit used puppet. With our expertise and love of Open Source technology, we were already ahead of the game when utilising Puppet for this solution. Puppet has several useful features – from configuration management to defining infrastructure as code and managing multiple servers simultaneously.

We developed Puppet modules to install OpenLDAP masters in active-active mode, and local slaves that are used for user authentication. There are remote slaves which are not in the same network as the masters and these remote slaves connect to the masters via LDAP proxies. The remote slaves are used by clients that do not have direct network access to the network where masters are hosted. We have setup and tested LDAP clients on various Operating Systems, including CentOS/RHEL, Ubuntu and SLES.


The Result

Aptira staff have run several hand-over sessions to demonstrate how the system is designed as well as how it is used and operated. The LDAP farm are running in production now, offering authentication service to HPC users, cloud users and other ITS system users.

We have also provided complete documentation for the build and provided this to key staff who will be managing the system internally in future. Further to this, our staff run Puppet training courses to help external system architects, system administrators and DevOps staff to fully manage puppet. This course covers all the essentials of puppet, including writing manifests and leveraging the full toolset of the puppet languages.


Become more agile.
Get a tailored solution built just for you.

Find Out More

The post LDAP & Puppet appeared first on Aptira.

by Aptira at May 09, 2019 01:23 PM

SUSE Conversations

3 Reasons why the Open Infrastructure Summit in Denver was Simply Outstanding.

This was not my first rodeo. I’ve attended most of the OpenStack Summits since my first one in Atlanta, Georgia back in May of 2014. They have always been a highpoint in my working calendar and I look forward to each of them with tremendous enthusiasm. The first-ever Open Infrastructure Summit in Denver was no […]

The post 3 Reasons why the Open Infrastructure Summit in Denver was Simply Outstanding. appeared first on SUSE Communities.

by Terri Schlosser at May 09, 2019 01:00 PM

Galera Cluster by Codership

Meet Codership, the makers of Galera Cluster at Percona Live Austin 2019

After a short hiatus, we hope to meet and see you at Percona Live Austin 2019 (28-30 May 2019), as we have sponsored the event and have a booth in the expo hall, in addition to having some talks.

Our CEO and co-founder Seppo Jaakola will have a talk titled Galera Cluster New Features, happening in room Texas 5, on Wednesday at 11.55AM – 12.45PM. It will be a very interesting talk as Galera Cluster 4 features have made there way into MariaDB Server 10.4, and you can expect to hear a little more about when Codership, the engineers and makers of Galera Cluster will provide a MySQL version.

If you happen to sign up for tutorials, do not miss Expert MariaDB: Harness the Strengths of MariaDB Server by Colin Charles as part of the tutorial involves setting up a bootstrapped three node MariaDB Galera Cluster 10.4 with the Galera 4 replication library and learning about the other unique features it has.

For those interested in Galera Cluster, don’t forget there are a few other talks about Galera Cluster in the program, including one by Yahoo! Japan titled Immutable Database Infrastructure with Percona XtraDB Cluster which should be a great story about how they deploy a lot (think hundreds!) of Galera Cluster nodes in production.

Our booth will always have the wonderful Larisa Urse manning it, and we want to talk to you all about what you need from Galera Cluster, roadmaps, plans, how you use Percona XtraDB Cluster (PXC) (based on Galera Cluster), and more. We will have great conversations, and a whole bunch more planned for the conference including participating in the Passport Program — so dropping by our booth, talking to us, and get that coveted stamp, and you will be in the running to win a pair of Bose noise cancelling headphones. Don’t forget that we have great support and consulting services too, so come talk to us to find out more!


by Sakari Keskitalo at May 09, 2019 04:46 AM

May 08, 2019

Ben Nemec

Denver Summit Recap

Just back from the Denver Summit and PTG, so here are my thoughts about the Summit. I expect to post my PTG wrapup to the openstack-discuss mailing list since it's more developer-specific.

Overall, I feel good about this one. It seemed like I had interesting sessions to attend almost every block for the three days. This hasn't always been the case for me at past Summits so it was a definite positive for this one. I will note that I was in a much happier place than I have been at some previous events, so it's possible this was just a perception change on my end. Looking through my list of attended sessions, I feel like they were legitimately interesting though.

Oslo

As PTL, this was obviously the main thing I was there for. We did a project update, and I also had a lot of good discussions with other folks in the community who were interested in contributing to Oslo. While I've had people approach me before, it felt like it happened quite a bit more at this Summit than previous ones. Fingers crossed that this will lead to an increase in Oslo contributors. :-)

Metal3

Metal Kubed was included in the keynote demos and kept coming up throughout the week as something people were interested in. This is good since it's the primary thing I'm working on these days. :-)

Keystone

I spent a bunch of time in Keystone sessions during the week. A lot of the discussions went way over my head, but there is quite a bit of work going on around policy and quota. Both of those involve Oslo libraries so they're relevant to my interests. The Oslo side of the policy work is done as far as I know, but currently the oslo.limit library for quota enforcement is essentially a bare cookiecutter repo. Plans were made at the Forum and PTG to move that along and I'm feeling good about where we're headed. You can find a bunch more details in the cross-project session etherpad.

Thanks to the Keystone team for tolerating me all week and thanks to the Nova team for some excellent cross-project sessions on policy and quota.

Services as Libraries

I attended a session that was trying to address how to deal with cross-project feature dependencies, i.e. when a feature proposed for one project depends on a feature recently added to a service. Currently this is problematic because services tend to be released only at major milestones during the cycle, so testing new features like this requires installing from source. A couple of options were floated:

  • Split the services into "service" and "service-lib". This would allow us to continue treating the services themselves the same way we do now, but release the -lib version more often. However, it also requires a non-trivial amount of refactoring work.
  • Treat the services as libraries and release them more often. The positive side of this strategy is that it doesn't require any real code changes in the services. The downside is that it might result in a number of major releases during a cycle as services remove deprecated features and such. This isn't a huge problem, but it might be confusing for people who are used to one major version bump per release. There's also the potential that releasing mid-cycle might result in partially complete features showing up in a release, but hopefully that can be mitigated by coordinating releases with the project teams.

Some followup discussion is planned with the affected services to determine whether they have the bandwidth to take on a large refactoring or if they would prefer the second option.

Storyboard

We've been looking at the Storyboard migration pretty much since I started as Oslo PTL, and this cycle one of the major blockers (lack of priority migration from Launchpad) was addressed. \o/

There are a few remaining concerns though. Attachment support is still WIP, search syntax is still mystifying to most users, and it sounds like there are some growing pains happening on the database side. The first two have plans in place to make progress, but it was mentioned that there is a lack of database optimization knowledge on the current Storyboard team. If you or someone you know have experience in this area and would like to help I'm sure they'd appreciate it.

Technical Vision Review

I sat in on this session mostly out of general interest since Oslo is excluded from the scope of the document and thus didn't do a vision review. However, it was pointed out that there may still be value in doing one since there are elements of the vision that Oslo could still contribute to. I (or someone else ;-) will have to take a second look.

Other than that, the majority of the session was discussing the value of the review and how to encourage the projects that haven't done one yet to do so. There were some interesting proposals in the session etherpad about how to make it easier to do the review process, which will hopefully help.

Autoscaling at Blizzard

I don't have a whole lot to add to this talk, other than the fact that if you didn't see it live you should go watch the replay when it's up. Blizzard is doing a lot of things right with OpenStack, and this is one of them.

PTL Tips and Tricks

This session went better than I dared hope. We pretty much filled the 40 minutes with quality discussion of how to be the best PTL you can be. I was able to share some of my knowledge and also learned quite a bit, including at least one thing that I will be implementing for Oslo soon (an alternative to courtesy ping lists in IRC meetings). This really deserves a separate writeup, but in the meantime you can see the etherpad for details about the discussion.

Image Encryption

Good news everyone: Oslo is no longer involved. ;-)

Actually, the real good news is that major progress was made on this. Getting a bunch of smart people together in a single room to discuss it was very helpful and it sounds like we have a workable plan to move this forward. It will no doubt still be an enormous amount of work since it crosses so many projects, but at least there weren't any clear blockers left at the end of the session. Major props to Josephine (Luzi) for sticking with this despite all of the delays.

Conclusion

As I mentioned earlier, I came out of this Summit feeling good about what we got accomplished. It seemed like most sessions were making progress and not getting stuck on bikeshedding or fundamental disagreements about the direction projects should be moving. I hope we can keep up that momentum throughout the cycle and have another good Summit in Shanghai.

by bnemec at May 08, 2019 04:56 PM

Ed Leafe

OpenStack PTG, Denver 2019

Immediately following the Open Infrastructure Summit in Denver was the 3-day Project Teams Gathering (PTG). This was the first time that these two events were scheduled back-to-back. It was in response to some members of the community complaining that traveling to 4 separate events a year (2 Summits, 2 PTGs) was both too expensive and … Continue reading "OpenStack PTG, Denver 2019"

by ed at May 08, 2019 04:41 PM

Aptira

Upskilling your Staff with OpenStack

Aptira OpenStack Cloud: OpenStack Planning, OpenStack Development, OpenStack Integration

Sometimes, a standard hardware solution just doesn’t cut it. Typical physical servers are limited by their hardware and performance can suffer as a result. This company wanted to upgrade their internal infrastructure to OpenStack but lacked the skills to do it. So, we trained them in the dark arts of OpenStack.


The Challenge

Like many organisations, this company had been using traditional VMWare virtualisation technology for a long time. They wanted to take advantage of recent technology trends and see if they could use this to their advantage by introducing OpenStack to their in-house systems. In order to do this efficiently, their staff needed to bring their OpenStack knowledge up to date, and their boss is putting this to the test by sitting them through the Certified OpenStack Administrator (COA) exam.


The Aptira Solution

Aptira is the leading provider of OpenStack services in the APAC region. This includes private and hybrid clouds, managed services, consulting, custom development, automation, orchestration and of course technology training. Our global team has experience operating some of Australia’s first OpenStack clouds as well as some of its largest and fastest growing. We are the founders and prime motivators of OpenStack in Australia, India and Taiwan, with several of our staff holding seats on the OpenStack Board of Directors over the years.

We put this expertise to good use, providing the customer with 5 days of in-house OpenStack training. This training consisted of 50% theory and 50% on lab exercises, giving students the ability to get hands-on knowledge of OpenStack.

The course we provided covered the following topics:

OpenStack Introduction (overview, projects, architecture, provisioning)

  • Controller Nodes (dashboard, user management, message queuing, image management, storage)
  • Compute Nodes (virtualisation, instance management)
  • Network Nodes (networking concepts, network management)
  • Object Storage (object storage basics, architecture)
  • OpenStack Installation

The Result

Over the course of 5 days, 7 students attended the course which was delivered in-house by an Aptira engineer. All of our courses are delivered by our engineers, so students have the opportunity to learn from instructors with real world expertise. All students successfully completed the course with at least one student (that we know of so far) having taken and passed the COA exam.

Whilst the COA exam is being discontinued, Aptira will continue to provide OpenStack training and can assist you with upgrading your infrastructure to OpenStack.


Learn from instructors with real world expertise.
Start training with Aptira today.

View Courses

The post Upskilling your Staff with OpenStack appeared first on Aptira.

by Aptira at May 08, 2019 01:06 PM

Trinh Nguyen

Searchlight at Denver Summit 2019



In the last summit (Denver, CO), the Searchlight team had an opportunity to introduce its cool stuff to everyone. Due to the visa issues, there only one out of three members can come to the US. Thuy Dang, one of the core tea,, delivered the presentation and had a great conversation with the other community members.




The key points you can take out from the Project Update [1] and Project Onboarding [2] sessions are:
  • Review of Searchlight current status: 3 active contributors
  • Features introduced in Stein: support ES 5.x, bug fixes, multi-cloud vision
  • Features for Train: multi-cloud support, new resource indexed (e.g., Tacker, Octavia, etc.)
  • Introduce different ways you can contribute to Searchlight. Hopefully, after the summit, there will be more contributors interested in Searchlight.
You can check out the slides for Project Update [4] and Project Onboarding [5].

Even though there were not many people attend the sessions, Thuy had had a great chance talking to some of the original Searchlight contributors and discuss with them our new direction. We got some comments and feedbacks and will bring them up on our next team meeting.

Ok, so, that's it.

We rock!!!


References:

[1] https://www.openstack.org/summit/denver-2019/summit-schedule/events/23634/searchlight-project-update
[2] https://www.openstack.org/summit/denver-2019/summit-schedule/events/23623/searchlight-project-onboarding
[3] https://docs.google.com/presentation/d/1UggaJ8cCrLq_HKW0R23XZ45oGhKv_D8kRB9TfYLv3Lk/edit?usp=sharing
[4] https://docs.google.com/presentation/d/1wSpjzQclM3EFVkhaAwOQQmkTwyXpv6UkCnlIfJSiZbY/edit?usp=sharing

by Trinh Nguyen (noreply@blogger.com) at May 08, 2019 08:09 AM

May 07, 2019

Aptira

Case Study – Complex Cloud Application Lifecycle Automation using Cloudify

Aptira Network Partner Logo: Cloudify

One of the world’s largest IT companies required a Cloud native application orchestration solution for their data management services which can be potentially applied to their other software technology product lines. We developed a highly extendable orchestration solution based on Cloudify – an award-winning Open Source application and network orchestration platform based on TOSCA.


The Challenge

This organisation needed the new orchestration solution to be integrated with multiple components – including their existing cloud environment (private DC/AWS/Azure), configuration management toolset (Ansible) and their application product line.

This integration was required in order to provide a centralised web-based management interface for complex deployments (such as automated service/resource deployment and updates), along with metrics monitoring and to support day 2 operations in an automated fashion. Also, service function chaining in application deployment is essential with the proposed orchestration solution. Namely, the parameters between deployments should be able to pass on, and one deployment may trigger others. Another bonus of this integration is that it will allow the organisation to provide a user-friendly interface, in turn mitigating any operational changes to their clients.


The Aptira Solution

Aptira and Cloudify have partnered to deliver solutions for Open Infrastructure, SDN, NFV and ONAP for leading Telco’s and Enterprises, and we have delivered many multi-cloud solutions based on Cloudify’s orchestration platform. We utilised this deep expertise to develop TOSCA blueprints for Cloudify to meet the customers’ requirements, implementing the following:

  • Full application lifecycle management: The solution doesn’t just automate the initial deployment phases (such as installation and setup) but also make it easy to update resources during runtime and monitor all post-deployment changes.
  • Resource management on a variety of infrastructures: The solution introduces a plugin mechanism to easily enable, simplify and unify the control and management of resources and application deployments in clouds (OpenStack, AWS, GCP, Azure), infrastructure (vSphere) and containers (Docker, Kubernetes) via single orchestration system.
  • Central orchestration for Ansible configuration management: The solution allows the organisation to leverage existing Ansible playbooks to integrate into the orchestration system rather than to convert it into a different format. It is designed to provide different ways to orchestrate repeated operations via Ansible, include running Ansible playbooks in a centralised platform or remote managed hosts via SSH connections or native Ansible APIs.
  • Service deployment chaining: The solution allows complex resource provisioning and deployment steps in a fully automated fashion that will help the organisation to easily build application and service pipelines for their customers.
  • Auto scaling and recovery: The solution allows for automatic scaling functions as well as intelligent, automatic backups through centralised management, monitoring, scaling, and recovery operations.

The Result

Aptira’s application orchestration solution has been deployed in the organisation’s Azure cloud environment with their Ansible configuration management system, meeting the following requirements:

  • Automating required cloud resource provisioning before application deployment
  • Reusing existing or shared cloud resources
  • Deploying applications using Ansible playbook libraries via the centralised orchestration management interface
  • Capturing real-time performance metrics of deployed applications and resources
  • Allowing deployment dependency and complex application deployment chaining
  • Allowing resource changes and application configuration updates during runtime
  • Using Ansible playbooks through centralised orchestration management to regularly execute day 2 operations such as automatic backup

This solution can now be applied to their other software technology product lines, fulfilling the organisation’s operational scenarios and requirements for application deployment.


Become more agile.
Get a tailored solution built just for you.

Find Out More

The post Case Study – Complex Cloud Application Lifecycle Automation using Cloudify appeared first on Aptira.

by Aptira at May 07, 2019 01:18 PM

Mirantis

Quick Tip: Use Apache as a proxy server to access internal IPs from an external machine

Sometimes, you don't have a GUI, but you want to access a web server running on a local IP address. Use Apache as a proxy server to access that local IP address via an external IP address to that VM.

by Nick Chase at May 07, 2019 01:00 PM

May 06, 2019

OpenStack Superuser

The Open Infrastructure Denver Summit: What you need to know

DENVER — A generous dusting of snow only added more excitement to the first Open Infrastructure Summit. The sessions, workshops and lightning talks were enriched by the participation of people from over 50 countries using and contributing to over 30 open -source projects.

If you didn’t attend—or if you did and want a replay—Superuser collected the announcements, user stories and Forum discussions you may have missed.

You can also catch videos for the keynotes at the OSF website. Recordings for Summit sessions will be available soon.

Jump to roadmap & technical decisions
Jump to case studies
Jump to news from the OpenStack ecosystem
Jump to what’s next

Let’s start with the OpenStack Foundation announcements:

Kata Containers and Zuul were the first pilot projects confirmed as top-level open infrastructure projects by the OSF Board of Directors. The first pilot project was announced in December 2017 and the confirmation requirements were approved by the board earlier this year.

The OSF launched the OpenStack Ironic Bare Metal Program highlighting the commercial ecosystem for Ironic, at-scale deployments of Ironic, and evolution of OpenStack beyond virtual machines. Coinciding with the program launch, Superuser published two case studies—CERN and Platform9— highlighting the broad use of Ironic. Stay tuned to Superuser as we roll out more.

The Airship community released version 1.0. It’s already in production at AT&T and SKT. The first release delivers a wide range of enhancements to security, resiliency, continuous integration and documentation, as well as upgrades to the platform, deployment and tooling features.

Collaboration without boundaries

The next OpenStack release, Train, is scheduled to arrive in October 2019. At the Forum, the entire OpenStack community (users and developers) gathered to brainstorm the requirements for the next release, feedback on the past version and strategic discussions.

In Denver, the community discussed everything from cross-project best practices for integration with Linux distributions to edge computing use cases and TripleO architecture; check out the full list of Etherpads here.

The Project Teams Gathering (PTG) is an event organized by the OpenStack Foundation for engaged community members involved in teams working on one of the projects supported by the OSF (workgroups, development teams, special interest groups…). OpenStack projects from Barbican to Vitrage, SIGs from edge to security and pilot projects like Airship and StarlingX participated. Check out all the Etherpads here.

At the PTG, James Page proposed to end the Upgrades Special Interest group (SIG). Page asserts that upgrades in OpenStack are no longer a “special interest” but now an integral part of the philosophy of projects within the OpenStack ecosystem. “Although there are probably still some rough edges, we don’t think we need a SIG to drive this area forward any longer.”

During Monday keynotes, 11 outstanding contributors were recognized with the Open Infrastructure Community Contributor Awards with quirky categories like the Bonsai Caretaker Award. There was also special surprise award for Lauren Sell, former VP of OSF Marketing, for all her years of community building.

The NIST Public Working Group on Federated Cloud (PWGFC) has been working for close to two years to develop an approach to advancing the Federated Community Cloud, with a framework to support seamless implementations of disparate community cloud environments. The Open Research Cloud Alliance (ORCA) has been actively developing consensus across the scientific community stakeholders aimed at identifying and working collectively to mitigate and resolve those impediments, whether driven by technology choice, regulatory obligations or historical practices that interfere with the ability of globally dispersed researchers to effectively gain access to research data and resources.

In Denver, the groups discussed possible federation deployment and governance models that embody the key concepts and design principles being developed in the NIST/IEEE Joint working group and ORCA. They want to encourage developers, users and cloud operators to provide use cases and feedback as they move forward in these efforts.

The Kata Containers and Firecracker teams provided an update in Monday’s keynote to highlight progress around community collaboration and project integration. They also discussed rust-vmm, a cross-project collaborative initiative to develop container-specific hypervisors.

This just in from the Open Infrastructure ecosystem:

Mirantis announced a web-based SaaS application that enables users to quickly deploy a compact cloud and experience the flexibility and agility of Infrastructure-as-Code. Available next month, Model Designer for Mirantis Cloud Platform (MCP) helps infrastructure operators build customized, curated, exclusively open source configurations for on-premise cloud environments.

Red Hat added three new Red Hat OpenStack Platform customers—Algar Telecom (a leader for internet services in Brazil), the University of Adelaide (an Australian public university), and Vodafone Ziggo (a leading Dutch communications provider).

Red Hat Virtualization 4.3, the latest version of Red Hat’s Kernel-based Virtual Machine (KVM)-powered virtualization platform, will be generally available in  May 2019. Built on the enterprise foundation of Red Hat Enterprise Linux, Red Hat Virtualization 4.3 is designed to deliver greater security, easier interoperability and improved integration across enterprise IT environments.

Public Health England, an executive agency of the Department of Health and Social Care in the United Kingdom, adopted Red Hat’s open hybrid cloud technologies to support modern digital public health services in the UK.

SoftIron announced the release of their newest Ceph-optimized storage appliance, HyperDrive Density+. SoftIron’s HyperDrive platform is a portfolio of dedicated Ceph appliances and management software, purpose-built for software-defined storage (SDS). SoftIron also published a HyperDrive case study highlighting how the Minnesota Supercomputing Institute unlocks cost savings and scalability.

Trillio landed the latest release of TrilioVault 3.2. The release extends TrilioVault’s cloud-native backup and recovery capabilities to companies using advanced, modern architectures for clouds and virtualized infrastructure.

VEXXHOST, who also took home this edition of the Superuser Awards, announced the latest update to their cloud computing offering, introducing Kubernetes Enablement. This offering enables businesses with existing OpenStack private clouds to integrate and consume Kubernetes as a fully managed solution.

Now, a word from open infrastructure users in production

Adobe Advertising Cloud shared the fast-track journey they took to reach seven data centers in as many months. The session also covered the seven challenging scenarios with StatefulSets, GitOps, autoscaling for machine learning and auto-remediation. From developing with spot instances in dev to multi-cloud in production, they explained how their cloud platform team dealt with an interesting set of challenges, including preventing the derailing of their Kubernetes journey.

Arm showed the progress made in running Kata Containers on aarch64 with two demos: One running Kata Containers with Docker on aarch64 and one running Kata Containers with Kubernetes and CRI-O on aarch64.

In addition to a keynote discussing how AT&T’s deployment of 5G is powered by an Airship-based containerized OpenStack cloud, AT&T also discussed their experience using bare metal Kubernetes clusters to run containerized workloads including OpenStack itself. Alan Meadows and Pete Birley covered topics from challenges when upgrading Kubernetes to unexpected fallouts that can occur when running complex workloads while maintaining these mission critical environments powering their 5G infrastructure.

Amy Wheelus, vice president of network cloud at AT&T, also provided an update on their 5G rollout plan in a blog post. “While we may not be implementing DevOps in the purest sense of the word, Airship, collaboration and the new organizational structure between development and operations were EFFORTS to delivering into production a new network cloud platform and a new 5G Packet Core in record-setting time. These teams met the challenge, and AT&T launched the world’s first standards-based mobile 5G network in December of 2018. Today, our 5G network is live in parts of 19 cities, and our goal is to have nationwide coverage using sub-6 spectrum by early 2020.”

Edge computing is a typical use case for multi-tenant deployments, where edge nodes can scale to large numbers of sites, distributed in distinct locations. In this session, Intel and Baidu used Kata Containers to implement FaaS and deploy the service on the edge side, explaining why enhanced isolation is necessary, what gaps need to be filled in this use case and how the Baidu edge team met the requirements for the real deployment.

Blizzard Entertainment, a leading developer and publisher of entertainment software, has been using OpenStack as a private cloud to host its game services since 2012. They shared how their OpenStack autoscaling implementation to support their best-selling team-based shooter Overwatch, focusing on the unique challenges of running video games in the cloud and presents the advantages of utilizing autoscaling.

Major upgrades can be daunting, even within a release cycle or two behind the latest. Box discussed how they keep their OpenStack cloud up to date, covering the steps its team took to plan, test and perform upgrades spanning four releases (Mitaka to Queens) during single maintenance windows.

CERN presented two sessions at the Summit, one covering the latest developments in its cloud, focusing on the latest container use cases in high energy physics and recent scale tests to prepare for the upgrades of the Large Hadron Collider and the experiments. The second session discussed steps they are taking to meet the computational needs that will increase dramatically with the next run of the LHC and how they are prioritizing ways to make more resources available to their private cloud clients.

China Mobile featured in two sessions, sharing how their 5G system platform is based on OpenStack’s NFV system architecture and CI/CD as an enabler to building next generation networks.

Among the difficulties arising from a large public cloud deployment, sizing the compute host and designing message queuing and databases can be most challenging. OVH shared how its team designed their databases clusters, how they optimized configuration to handle the increase of requests, how they monitor databases performances, and how they handle intervention on databases without impacting clients.

Reliance Industries Ltd. gave an overview and offered a demo of its Monasca architecture that counts over 700 compute nodes hosted and over 25,000 VMs in production. They are also working with the community to develop a road map for upcoming releases based on user experience.

Verizon Media relies on open source to run one of the largest open infrastructure environments in the world—over 4 million cores. More importantly, they are also deeply involved in the open source community that’s there to support it. In his keynote, James Penick, architecture director, discussed their open infrastructure strategy and the layers of open source technologies they use to power their business, which includes both OpenStack and Kubernetes.

What’s next

That’s a strong finish for the Denver Summit, but we’re already thinking about our next run.

We’re taking the next Open Infrastructure Summit across the pond to Shanghai, China the week of November 4.

The post The Open Infrastructure Denver Summit: What you need to know appeared first on Superuser.

by Superuser at May 06, 2019 03:30 PM

Ed Leafe

Open Infrastructure Summit, Denver 2019

The first ever Open Infrastructure Summit was held in the last week of April 2019 at the Colorado Convention Center in Denver, CO. It’s the first since the re-branding from OpenStack to Open Infrastructure began last year to be officially held with the new name. Otherwise, it felt just like the OpenStack summits of old. … Continue reading "Open Infrastructure Summit, Denver 2019"

by ed at May 06, 2019 03:27 PM

Aptira

Case Study – Full-Stack Evaluation of a Private Cloud Solution

Aptira Hexagon Icon

One of our Enterprise Communications Service Provider customers is currently in the process of rolling out a large private Cloud platform to enable the deployment of new and flexible services. They’d like to utilise the power and flexibility of Network Functions Virtualisation (NFV) to enable these new services.


The Challenge

The Telco wanted to perform an evaluation of different private Cloud components in a full-stack configuration that included a simulation of their own network infrastructure. And all without setting up their own lab and with minimal overhead on the customer’s team.

The full stack Cloud software infrastructure included:

  • Multiple VIM’s based on two OpenStack distributions (Redhat RDO and Mirantis Fuel)
  • NFVO Orchestration software (Cloudify)
  • Third-party components including:
    Apigee’s On-Prem API Management solution and
    InfluxData’s TICK Stack event management solution
  • OpenDayLight SDN Controller software
  • Multiple test VNF’s in clustered configuration, e.g.
    Clearwater IMS
    F5 Load Balancer

The evaluation needed to be shown integrating to a number of enterprise Business Support Systems (BSS), Operations Support Systems (OSS), Cloud and networks systems.

In most evaluation situations, enterprise organisations usually have an established Model lab environment set up with both compute and network resources available for software tests, trials and evaluations. This type of Model lab is usually established by an organisation to provide a test bed for new software to operate on a platform of known characteristics, and with interfaces to multiple supporting systems, e.g. BSS and OSS platforms. The inter-networking was managed using OpenDayLight SDN for SD-WAN simulation.

But not only did this customer did not have such a Lab environment available in their timeframe, but the size and scope of the evaluation would have exceeded most Model labs anyway.


The Aptira Solution

Aptira was able to solve this problem by performing the evaluation using its own networking and compute resources. Not only does Aptira keep multiple lab environments running for both customer and internal projects, but we used to be a hosting company, so we had both the available infrastructure resources and the technical skillsets required to jump on this problem and fix it – we’re highly experienced at spinning up multi-tenanted environments very quickly. 

Aptira reconfigured its own lab to provide the necessary compute and network infrastructure. Not only did the evaluation lab support the full Cloud stack but also simulated multiple sites with local Data Centre networks and Wide-area networks. 

This work has bee executed in six of streams, configuring, integrating and testing:

  1. Physical infrastructure required to support the Cloud stack;
  2. VIM-layer virtualisation
  3. SDN configurations required to simulate the customer’s network and for traffic engineering purposes
  4. API Gateway functionality for 3 purposes:
    Evaluating the API Gateway functionality
    Simulating API transactions to / from thirteen external platforms
    Capturing API transactions for evaluation documentation
  5. Configuring and testing the event management for four purposes:
    End-to-end telemetry of NFV events for CLAMP and orchestration purposes
    End-to-end telemetry of operational events for systems management purposes
    Simulating behaviour of external systems (e.g. The distribution of event data to an external Analytics engine and the resulting inbound trigger events from analytics policy execution)
    Monitoring and capturing events for evaluation documentation
  6. Configuring and testing the orchestration policies

Aptira’s ability to rapidly configure its own resources to meet the customer’s needs saved the customer significant time and investment. 


The Result

Aptira successfully completed build of the lab environment, the configuration of the NFVO full-stack, and all the required integration points. The evaluation was completed and received a high compliance rating. The Telco customer was able to validate the performance of the entire stack without impact to any of their infrastructure, software, or people. 


Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Case Study – Full-Stack Evaluation of a Private Cloud Solution appeared first on Aptira.

by Aptira at May 06, 2019 01:42 PM

May 03, 2019

Aptira

Case Study – Building a DevOps Toolkit for a Customised Kubernetes Deployment

Aptira: Customised In-House Tool for Consulting & Managed Services

One of the most recent projects our Solutionauts have worked on involved developing a DevOps toolkit for customised Kubernetes deployment for one of our large Enterprise customers. This integration includes Kubernetes, VMWare, Ansible, Golang, RabbitMQ, Harbor, Prometheus, Grafana and Helm. Sounds like a technology disaster waiting to happen!


The Challenge

Initially, an in-house toolkit had been developed by the customer’s internal team to deploy Kubernetes. Unfortunately, this internal team lacked the skills and resources required to design and build it, resulting in an unstable, unreliable and flaky system.

In order to make this system run more efficiently, they needed to implement a modern DevOps toolkit that can automate the deployment of a production-grade system. This Enterprise also required a new monitoring system to integrate with their existing operations platform and a team to provide ongoing enhancements to the toolkit, freeing up internal resources to work on their existing business.


The Aptira Solution

To produce this toolkit, Aptira built a suite of Ansible playbooks (consisting of more than 20 roles). Some playbooks were based on our generic Kubernetes Installer which can install a generic k8s system. We then customised the Kubernetes Installer, deployed Kubernetes on VMWare and enabled the ability to provision VMWare volumes for Kubernetes pods. We also deployed VMWare’s Harbor registry server for storing Docker images, RabbitMQ servers (in HA) for messaging, and Prometheus for monitoring and alarming.

To integrate Prometheus with their internal alarm and event management system, Aptira developed a custom simple network management protocol (SNMP) trapper to convert Prometheus alerts into SNMP traps and send them to the internal alarming system. This SNMP trapper has been developed in Golang and is running inside a Kubernetes container.


The Result

With the help of this highly customised toolkit, all disasters have been avoided. The system is now successfully running in production and Kubernetes has been deployed. It is fully integrated with this Enterprise’s internal operations platform, so monitoring and alerts are sent to the appropriate team for action.

We will continue to monitor the platform to ensure any enhancements are applied, continually improving the efficiency of this system. There’s more to come on the Golang part of this project – stay tuned!


Containerise your Application.
Get Microservices for Macroresults.

Find Out More

The post Case Study – Building a DevOps Toolkit for a Customised Kubernetes Deployment appeared first on Aptira.

by Aptira at May 03, 2019 01:34 PM

May 02, 2019

Aptira

Case Study – OpenStack Swift Deployment

Aptira OpenStack Swift Logo

One of Australia’s leading technology providers (who’s name we can’t mention due to security reasons) needed a secure multi-region private cloud to store their private data. We deployed a global cluster of OpenStack Swift Clouds, providing an even more secure storage solution for their data. Data sovereignty is just one reason to use a private cloud – it is important to utilise the flexibility and efficiency of cloud technology whilst at the same time protecting your data and keeping any sensitive information private.


The Challenge

Due to the highly sensitive nature of this customers data, they need a private cloud to keep this data secure. Previously, they had used a single-region standalone Swift deployment where all data was stored in the one data center. They now need a multi-region Swift deployment integrated with OpenStack Identity service, which also has the potential to expand to other OpenStack services.


The Aptira Solution

We love Swift! Aptira’s Solutionauts have loads of experience implementing Swift for our customers in Australia and across the APAC region, so when this project came along we already had a head start.

We proposed a containerised OpenStack solution deployed using Kolla-Ansible. The solution consists of a highly tailored OpenStack Pike deployment, however only keystone, horizon and Swift were deployed at the initial stage.

Swift was setup as a global cluster in two regions, with objects first written to the local region, then replicated to the second region. Swift endpoints were put behind their existing load balancers and a local Docker registry was setup to speed up the deployment.

We ran into a bug in Kolla the images – rysnc was not installed, resulting in failure of the cross-region replication. Not only do we love Swift – we love a challenge! We patched the images in their private Docker registry, swiftly removing the bug and reporting it upstream 😉


The Result

Aptira deployed our proposed solution into production – an OpenStack Cloud running swift in two datacenters as a global cluster and each datacenter as a separate region. Read/Write affinity has been enabled to allow for a replica to be written to the local DC before being replicated to the second DC, and this entire deployment has been automated by a customized and patched Kolla-Ansible solution.

Not only did this solution meet all their requirements and pass their acceptance test – their highly sensitive data is now stored more securely than their previous single-region standalone Swift deployment.


How can we make OpenStack work for you?
Find out what else we can do with OpenStack.

Find Out Here

The post Case Study – OpenStack Swift Deployment appeared first on Aptira.

by Aptira at May 02, 2019 01:42 PM

May 01, 2019

Mike Dorman

Tips for a Successful *aaS

This is a more detailed write-up of a lightning talk I gave at the Open Infrastructure Summit in 2019. Slides are available here, and the video here. (See also the related talk, “Don’t Repeat Our Mistakes: Lessons Learned from Running Go Daddy’s Private Cloud“) Currently I work on the TechOps team at Twilio SendGrid, which

Read full post.

by Mike Dorman at May 01, 2019 04:28 PM

OpenStack Superuser

The Open Infrastructure Summit: What we’ve built and where we’re headed

DENVER — While many people who work in tech can point to a long and winding road, Jonathan Bryce may be one of the few who can say that cleaning toilets got him where he is today.

Bryce, now the executive director of the OpenStack Foundation, shared his story at the first Open Infrastructure Summit as a way to talk about the past and future of the organization. As a kid, he was very focused on music and his early interest in  computers was more or less just another hobby.

That changed a few years later in the mid 1990s.

“During a summer, I worked in San Antonio Texas at a commercial cleaning company cleaning toilets and office buildings and taking out trash for a company that one of my brothers was involved in,” he says. His brother asked if he would help get the company on this new-fangled thing called the world wide web. Could Jonathan create a website and get the company online?

“I got the idea that this commercial cleaning company needed a portal and a dynamic backend, so I built that for them,” he says. Some of their customers really liked this website and said, ‘hey how did you get that how can we do something similar?'”

It turns out that technology was actually a pretty good way to make money as a teenager, definitely easier than mowing lawns and cleaning toilets, “so this is when my life took a little bit of a turn towards towards technology.”

He later realized that technology alone was never going to be a long-term sustainable advantage. Introduced to open source while working at Rackspace, he realized that open source is not a marketing initiative or a business model, but “an innovation philosophy.”

Companies shouldn’t take on the burden of building all their technology alone, however. “Open collaboration is a powerful force for driving technology to change our lives and our world,” he says. With an open ethos, companies can add value by selling a product or using the software to improve internal operations and enhancing services to your internal and external customers.

He pointed to the hundreds of organizations present in the packed auditorium who are already doing so, including CERN, VEXXHOST and LeBoncoin.

If sharing responsibilities and code between outfits as different as a research center, a public cloud provider and an online classifieds startup sounds like a recipe for chaos, Bryce says consider OpenStack today. The global community is in the top three of most active open-source projects, has landed 19 on-time releases, counts thousands of users, millions of cores and a $6.1 billion commercial market.

OpenStack is one of the most successful projects in history, he says, because of the community and the problems it has set out to solve. The community that sprung up around OpenStack grew to over 100,000 people in almost every country on the planet (180) while expanding the kinds of activities people collaborate on.

Bryce adds, “I think all of these have created an incredible opportunity where individuals can influence the direction of our shared future in technology and our shared future as humans. Here we are focusing on one small but key aspect of those shared futures: how do we build the best infrastructure systems by collaborating across communities, companies, countries.”

A few examples of this collaboration without boundaries?  Bryce cites some project examples: Kubernetes paired with OpenStack, Zun as a virtual-kubelet, massive open cloud, Open Heterogeneous Computing Framework and Rust-vmm.

“As we kick off this first Open Infrastructure Summit, that’s what I want to encourage all of us to do this week and over the coming months and years,” Bryce says, noting that 30 open source communities are participating at the Summit. “Let’s all bring our different perspectives, experiences, skills and contributions together to create something powerful, new, and useful and keep pushing our corner of the world to be more open and collaborative.”

Check out the full keynote here.

The post The Open Infrastructure Summit: What we’ve built and where we’re headed appeared first on Superuser.

by Nicole Martinelli at May 01, 2019 02:06 PM

Aptira

Case Study – Introducing a New SDN Controller

Aptira - SDN Software Defined Networking
A Tier 1 Telco has been experiencing ongoing operational issues around network stability, SDN controller version lag and gaps in pre-production testing procedures that have been plaguing their platform. Aptira has provided guidance on possible replacement SDN controller platforms whilst adhering to strict functional and non-functional criteria.

The Challenge

Whilst Software Defined Networking (SDN) is still a relatively new technology in the Telecommunications space, there have been many advances in SDN related projects and technology in the intervening months. Given the ongoing stability issues that this Telco has been facing, this was a good time to look around at the current environment and evaluate any contenders for replacing the components that are currently causing these ongoing issues.

Due to Aptira’s long history in dealing with similar environments, and having a significant amount of internal resources with engineering backgrounds, we were able to propose a new solution which will resolve the ongoing issues.  Recognising the customer’s problem, Aptira proactively moved to assist the customer, by executing the first step: identifying a potential replacement SDN controller.


The Aptira Solution

To begin, Aptira defined both functional and non-functional criteria to drive the analysis.

Functional Criteria:

  • Compatible with Cloudify
  • OpenFlow version 1.3 minimum
  • Non-OpenFlow based Protocol used
  • Compatible with Noviflow
  • API driven interfaces
  • Backing Datastore Software
  • Provides of is compatible with PCE functionality
  • Segment Routing Functionality
  • Telemetry Available

Non-Functional Criteria:

  • Georedundant
  • Fault tolerant failover
  • Fault tolerant southbound API
  • Cluster Type
  • Distributed Datastore CAP Guaranteed Metrics
  • Open Source
  • Development Language
  • Community Size and Type

Based on this criteria, Aptira performed global market research to identify suitable SDNC candidates.  Initially, our list consisted of 12 potential SDN Controllers, which was subsequently cut that back to 6 after our initial investigation.

Aptira’s analysis was based primarily on available specifications, technical documentation, and published research papers.  Our team of expert-level SDN engineers created the evaluation matrix that was used in the analysis and applied the relevant data gained through our research.


The Result

Whilst we believe that any of the platforms we analysed could fill the place of the current platform, the chosen controller will largely depend on the decision which the customer makes towards further developing, operating and maintaining the platform.

Ultimately, for any replacement platforms to be considered viable, they must enable the following outcomes:

  • Interoperability with the other existing components in the platform, particularly the Cloudify platform and Noviflow switch hardware
  • Be in alignment with the Project Requirements
  • Implement, or allow for the implementation of, all existing technical functional requirements provided by the current SDN Controller
  • Remediate the current stability issues occurring with the Platform related to the SDN Controller.

Given the period of time since initial design and implementation, the replacement should additionally provide:

  • A well-supported software base, in line with their organisational goals
  • An updated architecture based on updated industry experience
  • Inclusion of any applicable technology enhancements that have evolved

After our research had concluded, we provided a comprehensive report that included a description of the SDN Controllers  which we evaluated, a detailed comparison of the viable replacements for the SDN controller, as well as recommendations and a plan to assist them with progressing towards an informed decision.


Let us make your job easier.
Find out how Aptira's managed services can work for you.

Find Out Here

The post Case Study – Introducing a New SDN Controller appeared first on Aptira.

by Aptira at May 01, 2019 01:57 PM

OpenStack Superuser

Open Infrastructure Community Contributor Awards: Denver Summit edition

DENVER — The Community Contributor Awards offer recognition to those who might not be aware that they are valued.

These awards are a little informal and quirky but still honor the extremely valuable work that everyone does to make the open infra community excel. These behind-the-scenes heroes are nominated at every Summit by other community members.

There were three main categories: those who might not be aware that they are valued, those who are the active glue that binds the community together and those who share their knowledge with others.

OSF’s upstream developer advocate Kendall Nelson runs the program and handed out the honors during the keynotes. More on the Denver honors below in the words of the community members who nominated them.

There was also special surprise award for Lauren Sell, former VP of OSF Marketing, for all her years of community building.

The “Does anyone actually use this?” Award

Stig Telfer

He works tirelessly to shepherd and corral the Scientific OpenStack SIG. He’s always willing to pitch in with any OpenStack or open infrastructure-related meetup or event, going above and beyond every time. He’s continually putting in extra effort to further understanding of both OpenStack and Ceph at extreme scale, sharing learnings and results with the rest of the community, often engaging with upstream in order to help progress the quality of software that the rest of us get to use. He’s also one of the nicest, most honest people you’re ever likely to meet.

Bug Czar

For the individual who does the most to deal with the bugs no matter how big and ugly.

Sławek Kapłoński
He’s been on point for debugging and fixing a variety of Neutron gate failures during this cycle. Without him, we would have a bunch of bugs sitting around impacting our ability to merge code that all need triaging, debugging and fixing. Thankfully we have had him on deck to deal with that for us.
In particular bugs that end up affecting many tempest runs and represent a real bug in neutron failing to plug vifs properly on instances. Slaweq was able to debug this and get a fix sorted out
Like https://bugs.launchpad.net/neutron/+bug/1808171

Open Infrastructure Shield

For those who push to keep infrastructure projects compatible, open and available to everyone worldwide.

Allison Randal

She was instrumental in getting the project confirmation guidelines drafted and confirmed so the OSF pilot projects could become top-level projects. She lead countless discussions, readily answered questions in public forums and really guided the community through this process.
She has contributed to the future success of all OSF projects. This was not an easy or quick task – the entire community – from Kata and Zuul down to Airship and StarlingX and other open-source projects looking to be successful benefited from her hard work and leadership.

Friends of Mike Rowe, Doers of the Dirty Jobs

Like the TV series, these are hard-working people who earn an honest living doing the kinds of jobs that make civilized life possible for the rest of us.

Carlos Venegas Munoz and Marco Vedovati

Their work on releases and packaging is not super sexy, but super important and often thankless jobs.
Vedovati helped fix and improve the Open Build System (OBS) for Kata Containers.This was critical to make it easier to package Kata for multiple Linux distributions. He also volunteered as a Kat Herder to help the project work through and resolve issues. He made major contributions to the Kata Containers distribution build and packaging process – driving Kata integration into SUSE, as well as making significant improvements to Kata Containers release infrastructure and packaging for other distributions. He also volunteered as a weekly Kat Herder on several occasions to progress and resolve project issues.

Mentor of Mentors

For serious efforts in sharing knowledge with others. It’s easy enough to solve problem yourself, but teaching others how to solve problems is no easy feat.

Samuel de Medeiros Queiroz
It wouldn’t be fair to just recognize him for one thing since he has been contributing to the community for several years and in different ways. He’s been a really strong advocate on OpenStack in Brazil, being one of the referents in the whole of Latin America. He helped organizing OpenStack Days in Brazil, an event that has brought OpenStack to hundreds of individuals in Brazil and in the area. Every year this event has gained a lot of traction and helped lots of people to learn about OpenStack. He is also a great mentor, for Outreachy in the past and now acts as a coordinator for this internship continuing with the effort that started back in 2013. He’s also participated in other mentoring efforts, including the Women of OpenStack lunch and learn events and the Upstream Training. Last but not least, as an engineer he has made really important contributions to the Keystone project and made of that project what it is nowadays. He’s a great example of someone who loves the OpenStack community and deserves recognition.

Hero of the People

Some people stand up for the masses and work to make leadership better. Some people we are happy to call our hero. They make sure community members are heard and understood.

Ian Jolliffe
He’s attended 10 OpenStack Summits and is a member of the StarlingX TSC and he recently served as a member of the Edge Track Programming Committee for the Denver Summit. His contribution to the track included a focus on case studies and technical solutions, which also resulted in more 5G content, which is an emerging topic at the Summit.
Ian is on the StarlingX Technical Steering Committee, he is doing a great job on organizing the group and making sure they spend time on all relevant topics when they meet. He is actively representing the community on various industry events, has a great attention to detail and is nice to work with.

Bonsai Caretaker

These people keep pressing the button to feed the Tamagotchi, keeping it alive.

Gabriela Cervantes Tellez and Salvador Fuentes
They’ve both been fighting the fight and keeping our continuous integration up and running. It’s a thankless job and it’s also one of the most important factors in the success of the Kata Containers project.

The Giving Tree

Always around to give you what you need and help you keep moving forward.

James O. D. Hunt
He’s invested a lot of time in explaining new to contributors the steps needed to get their patches merged.
James is also likely the person that more than anybody else spend the time to make PRs advance (reviews, suggestions, pinging…) When new potential issues come out, he’s also been very proactive in tracking them in GitHub and pinging the relevant people for a opinion/solution.

The Key to Stack City

In some countries an ornamental key is presented to esteemed visitors, residents, or others whom the city wishes to honor. The Key to Stack City is reserved for those who have done much for OpenStack and are truly a friend to all those in the community.

John Dickinson
He not only binds the community together, for many he is the embodiment of the community. The longest running project team lead, a record that is very unlikely to ever be bested, he’s often a contributor’s first contact with Swift and always helpful and welcoming, whether on IRC or in person. He has undoubtedly had the most influence over Swift’s culture, working tirelessly to ensure that contributors’ perspectives and motivations are understood and represented. This both empowers contributors and allows him to recommend collaborators when he sees interests align. As a result, Swift’s community tends to have high levels of trust and camaraderie. John regularly challenges norms. Whether examining human processes, technical challenges, or even just the layout of a room, John is driven to find what works best for the community. While he will give consideration to what has worked for other teams (both within OpenStack and beyond it), he is willing to change tack if an idea doesn’t seem to be working out. He will not accept something “because that’s how it’s always been done.”

 

Stay tuned for news on when the next nominations open!

The post Open Infrastructure Community Contributor Awards: Denver Summit edition appeared first on Superuser.

by Superuser at May 01, 2019 01:05 PM

And the Superuser Award goes to…

DENVER — The OpenStack community and Superuser editorial advisors have weighed in on the finalists and chosen City Network  at the OpenStack Berlin Summit Superuser Awards, sponsored by Zenko.

VEXXHOST took the prize home for this edition. The company is a leading Canadian public, private and hybrid cloud provider with an infrastructure powered by OpenStack. VEXXHOST has been contributing to the OpenStack community since its second release in 2011.

Previous winners City Networks presented the awards during the Monday keynotes.

Nominees for this round of awards included EnterCloudSuite, The National Supercomputer Center in Guangzhou (NSCC-GZ),  and Whitestack.

The Superuser Awards launched in 2014 to recognize organizations that have used OpenStack to meaningfully improve their business while contributing back to the community. Previous winners include AT&T, CERN, China Mobile, Comcast, NTT Group, Paddy Power Betfair and UKCloud.

Stay tuned for the next cycle of nominations as the Superuser Awards head to the Shanghai Open Infrastructure Summit!

The post And the Superuser Award goes to… appeared first on Superuser.

by Superuser at May 01, 2019 08:31 AM

April 30, 2019

The Official Rackspace Blog

Aeroméxico Thrives with Rackspace and Red Hat

Conferences are valuable because they offer face time with customers, partners and industry colleagues, allowing us to share and learn a great deal in a short amount of time. We’re looking forward to Red Hat Summit 2019 for that very reason. This year, we’re featuring the story of Aeroméxico, which in three short years has […]

The post Aeroméxico Thrives with Rackspace and Red Hat appeared first on The Official Rackspace Blog.

by Pierre Fricke at April 30, 2019 11:00 AM

Stephen Finucane

Working With Documentation, The Openstack Way

This talk was co-presented with Alex Settle at the Denver OpenStack Summit in April 2019. It serves as a brief history of documentation within OpenStack along with a how-to on contributing to documentation today.

April 30, 2019 12:00 AM

April 29, 2019

OpenStack Superuser

OpenStack Ironic Bare Metal Program case study: Platform9

The OpenStack Foundation announced that its Ironic software is powering millions of cores of compute all over the world, turning bare metal into automated infrastructure ready for today’s mix of virtualized and containerized workloads. Some 30 organizations joined for the initial launch of the OpenStack Ironic Bare Metal Program, and Superuser is running a series of case studies to explore how people are using it.

Platform9 customers have been using Ironic for more than two years; here, members of the team talk about the why they adopted it and what benefits they’ve seen.

First a little background on the San Francisco-based startup: Platform 9 provides a software-as-a-service-based service to deploy and operate OpenStack hybrid clouds for KVM, VMware and public cloud environments. Platform9’s SaaS platform helps customers get an OpenStack hybrid cloud in minutes. In addition, Platform9’s Managed Kubernetes service allows customers to deploy and run Kubernetes on any infrastructure of their choice: OpenStack, bare metal, VMware, Public clouds, or on the edge. Many of their customers’ workloads are suitable for running in virtualized environment, however some customer requirements/workloads require them to run on bare metal, bypassing the hypervisor.

Why did you select OpenStack Ironic for your bare metal provisioning in your product?

Internally within Platform9, our test teams need to provision and deploy our software for validation on a variety of hardware targets such as different hardware manufacturers (HPE, Dell etc), processor types (AMD, Intel), GPU resources (Nvidia), storage types (Optane), or special resources (SR-IOV, DPDK, etc.) This was a very manual and time-consuming process slowing down our software release velocity and impacting release schedules.

We selected OpenStack Ironic to address the above customer needs as well as our internal need to speed up our hardware testing process and deliver our services faster to our customers.

What was your solution before implementing Ironic?

Without Ironic, the whole process from racking and stacking a server to the time users get access to it would take weeks. Because of this, different teams would acquire those resources would assume “ownership” of them. If another team needed those resources with a different configuration, it would again take a lot of manual networking configuration, provisioning and deployment. So teams would often be reluctant to share those resources even if they were not being utilized fully.  This led to a lot of under-utilization and increased costs as different teams would either procure their own resources or wait around until the resource is released by another team.

With Ironic, on the other hand, the variety of hardware resources can be pooled together and when users need a specific type of bare metal resource such as a HP box with AMD processor, they can simply request that flavor that was discovered and provisioned by Ironic already and be able to instantly deploy their workloads on that server in a sef-service manner. Internally, Platform9 tried to solve this problem with Cobbler and a lot of homegrown automation.  This approach required a lot of time, effort and maintenance of the automation code which could have been better utilized on improving and adding value to our product.  Using Ironic simplified and streamlined the bare metal provisioning and automation of our testing environment.

What benefits does Ironic provide your users?

The biggest benefit is time savings, without a doubt.  With Ironic, the time to provision bare metal servers is orders of magnitude faster, going from weeks or months it used to take previously with manual methods to just under 20 minutes.  The effect this has on our testing times and consequently on improving our software release time lines is revolutionary. In our customer environments, a whole manual ticketing process that starts from racking and stacking, switch configuration, network configuration, server provisioning to Operation System deployment is all completely automated and replaced by a simple single-click self-service experience.

From an administration standpoint, Ironic also reduces management overhead significantly by providing a centralized operational console with complete visibility of all the resources in terms of their location (racks etc.) and their specific configurations (CPU, RAM, storage and special hardware such as GPU).  As a result, maintenance and updates also become fast and easy.  Another big benefit is repeatability, i.e. the ability to provision a bare metal server with exactly the same images and configuration, thus making it possible to treat them as “cattle” and the ability to swap them out without impacting availability or reliability in production.

And finally, as with other OpenStack projects, Ironic supports the notion of plugins so you can use any switch from Juniper, Cisco, Arista allowing the network automation (like virtual LAN configuration, for example) to be agnostic to whatever switches are being used.  This is extremely important from our product perspective as it allows us to easily integrate in various customer networking environments.

What feedback do you have for the upstream OpenStack Ironic team?

More documentation!  There are a lot of areas in the product that feel lightly documented.  Specifically which versions certain features are in.  The Ironic Inspector rules and modules were fun to kind of reverse engineer to figure out.

Learn more

You’ll find an overview of Ironic on the project Wiki.

Discussion of the project  takes place in #openstack-ironic on irc.freenode.net. This is a great place to jump in and start your ironic adventure. The channel is very welcoming to new users – no question is a wrong question!

The team also holds one-hour weekly meetings at 1500 UTC on Mondays in the #openstack-ironic room on irc.freenode.netchaired by Julia Kreger (TheJulia) or Dmitry Tantsur (dtantsur).

Stay tuned for more case studies from organizations participating in the initial launch of the program.

The post OpenStack Ironic Bare Metal Program case study: Platform9 appeared first on Superuser.

by Nicole Martinelli at April 29, 2019 03:53 PM

OpenStack Ironic Bare Metal Program case study: CERN

The OpenStack Foundation announced that its Ironic software is powering millions of cores of compute all over the world, turning bare metal into automated infrastructure ready for today’s mix of virtualized and containerized workloads. Some 30 organizations joined for the initial launch of the OpenStack Ironic Bare Metal Program, Superuser is running a series of case studies to explore how people are using it.

The European Organization for Nuclear Research, known as CERN, provides several facilities and resources to scientists all around the world, including compute and storage resources, for their fundamental research.

CERN has been using Ironic in production for about 18 months and it’s now the standard tool to provision all new hardware deliveries to their end users.  Team members tell us that they currently have around 2,000 nodes managed by Ironic, but aim to enroll the majority of their remaining 10,000 servers over the course of the next 12 months. Work to integrate the pre-production burn-in, the up-front performance validation, or the integration of retirement workflows is currently ongoing in collaboration with the upstream team.

Why did you select OpenStack Ironic for your bare metal provisioning?

For several years, the CERN IT department has been providing compute resources to the laboratory’s experiments and administrative services via an OpenStack-based private cloud. Ironic was the natural choice to complement the service’s offering of virtual machines and container clusters by physical machines: the users access physical resources via the same interfaces and workflows as they already do for virtual machines and containers.

What was your solution before implementing Ironic?

Before Ironic, the whole provisioning workflows were based on tools built in-house, with the corresponding man-power intensive maintenance workload. Part of these workflows have been now moved to Ironic, in particular the more user-facing parts. We’re actively working on moving additional workflows, such as the initial burn-in or performance verification and are doing so together with the upstream community.

What other technologies does your OpenStack Ironic deployment interact with?

One of the reasons we chose Ironic was that the APIs are identical to the ones used for virtual machines in Nova. As we have a substantial number of container clusters created via OpenStack Magnum, this interface similarity now allows for provisioning of (mostly Kubernetes) clusters on bare metal machines. Removing the additional virtualization tax is relevant for performance sensitive applications, such as the experiments’ analysis code, which run on our batch system.

What benefits does Ironic provide?

The CERN IT department provides resources to all Large Hadron Collider (LHC) experiments and needs to make sure that the provisioned resources are correctly accounted. The integration of physical resources into OpenStack via Ironic does not only allow for a simplification of the resource allocation, e.g. with respect to quotas, but also for streamlining the accounting process (since all accounting information will eventually come from a single source).

What feedback do you have for the upstream OpenStack Ironic team?

The Ironic team has been helpful to work with, both to first-time deployers who may have encountered issues or questions during the setup and new contributors who have needed clarifications or make suggestions to improve the code base. As an operator, we have particularly appreciated the balance between the constructive feedback when proposing a new feature and the need to maintain a code base which needs to consider also other use cases and backwards compatibility.

Learn more

You’ll find an overview of Ironic on the project Wiki.

Discussion of the project  takes place in #openstack-ironic on irc.freenode.net. This is a great place to jump in and start your ironic adventure. The channel is very welcoming to new users – no question is a wrong question!

The team also holds one-hour weekly meetings at 1500 UTC on Mondays in the #openstack-ironic room on irc.freenode.netchaired by Julia Kreger (TheJulia) or Dmitry Tantsur (dtantsur).

Stay tuned for more case studies from organizations participating in the initial launch of the program.

Cover image: © CERN

The post OpenStack Ironic Bare Metal Program case study: CERN appeared first on Superuser.

by Nicole Martinelli at April 29, 2019 03:52 PM

It’s showtime for the OpenStack Ironic Bare Metal Program

DENVER — OpenStack Ironic—the bare metal provisioning project—now manages millions of cores of compute all over the world, the tech equivalent of playing to a stadium crowd.

Thirty organizations banded together for the initial launch of the OpenStack Ironic Bare Metal Program, including vendors running some of the world’s largest OpenStack clouds.  They range from giants like Verizon Media and CERN to operators such as France’s leading online classified ads company, LeBonCoin.  Here’s a look at how two of them are using it and why they adopted it.

Ironic offers many benefits to cloud architects and administrators who need to harmonize bare metal instances. The software supports automating the entire server infrastructure life cycle of deployments, including updates and decommissioning. It delivers cloud-like bare metal infrastructure with multi-tenant networks to end users when used as a driver to OpenStack Nova. With a standard API, broad driver support and lightweight footprint, Ironic excels as a management engine for a variety of bare metal infrastructure use cases – as demonstrated by the newly developed Bare Metal Operator for Kubernetes. These features make it work for a wide range of use cases, from small edge deployments to large data centers.

Take, for example, VEXXHOST a leading Canadian public, private and hybrid cloud provider with infrastructure powered by OpenStack.

“Ironic enables customers to dynamically increase or decrease their hypervisors and they don’t have to run two ‘compute’ pools but one converged pool of compute which can be used both for virtual machines and/or bare metal,” says CEO Mohammed Naser adding that Ironic simplifies the management of their servers. His company now ships it as part of their private cloud product, making the entire infrastructure powered by OpenStack from the ground up.”We only build out three controllers which then manage all the bare metal infrastructure which you can use both as metal machines or leverage those servers as hypervisors,” he says.

StackHPC, a startup focused on high-performance computing (HPC), software-defined networking (SDN), software engineering and OpenStack, has been users and developers of Ironic from the beginning. Ironic’s potential was apparent from its outset, says CTO Stig Telfer, and the team has been active in shaping some aspects of its evolution.

Clients in technical computing often have very different priorities in terms of trade-offs between performance and flexibility, he notes. The Bristol, UK-based company aims to develop solutions where Ironic is used in flexible ways, allowing clients to exploit the advantages offered by open infrastructure without sacrificing the performance they require.

StackHPC also uses Ironic as a standalone service (Bifrost) in Kayobe, their open-source deployment tool. Used in this way, Ironic provides the minimal subset of OpenStack needed to support deployment of private cloud control planes using Kolla and Kolla-Ansible.

Stay tuned to Superuser for more case studies —  like these from CERN and Platform9 — from the OpenStack Ironic Bare Metal Program.

Get involved

Read more about the latest Ironic updates and features in the recent Stein release and learn how to use and contribute to the project.

The community also formed a Bare Metal Special Interest Group (SIG) in February. Its mission is to make Ironic easy to operate and to evangelize the use cases and utility of the bare metal service. To get involved in the SIG, subscribe to the OpenStack Discuss mailing list or contact chris@openstack.org.

 

The post It’s showtime for the OpenStack Ironic Bare Metal Program appeared first on Superuser.

by Nicole Martinelli at April 29, 2019 03:46 PM

StackHPC Team Blog

StackHPC Joins the OpenStack Bare Metal Program

At StackHPC, our client requirements often take the form that we must deliver cloud-native infrastructure without making any sacrifice to existing levels of performance. This can be challenging at times, but would not be possible at all without OpenStack Ironic, the engine that makes software-defined bare metal work.

Ironic enables our clients to deploy on-premise high-performance computing infrastructure using the same methods they would use to deploy infrastructure in the cloud. This is driving a revolution in research computing infrastructure management.

Bare metal program logo

The StackHPC team's commitment to Ironic is long and deep, and pre-dates the formation of StackHPC itself. Within StackHPC we have made it a core component of our expertise. At the Open Infrastructure Summit this week in Denver, check out StackHPC team member Mark Goddard's presentation of his recent work on deep reconfigurability of bare metal. And come along to our hands-on workshop, A Universe from Nothing to get familiar with Kayobe, our Ironic-centric deployment tool for Kolla-Ansible OpenStack. Both are on Tuesday afternoon.

We'll also be talking about our commitment to high performance computing (and no doubt touching on the role Ironic can play in delivering it) in John Garbutt's presentation Lessons learnt federating OpenStack powered supercomputers on Monday afternoon, and Stig Telfer's panel session HPC using OpenStack on Wednesday morning.

Finally, the Scientific SIG on Monday afternoon always includes a boat load of bare metal.

by Stig Telfer at April 29, 2019 08:00 AM

April 26, 2019

RDO

RDO Stein Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Stein for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Stein is the 19th release from the OpenStack project, which is the work of more than 1200 contributors from around the world.

The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/7/cloud/x86_64/openstack-stein/.

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

Photo by Yucel Moran on Unsplash

New and Improved

Interesting things in the Stein release include:

  • Ceph Nautilus is the default version of Ceph, a free-software storage platform, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage, within RDO (or is it the default without OpenStack?). Within Nautilus, the Ceph Dashboard has gained a lot of new functionality like support for multiple users / roles, SSO (SAMLv2) for user authentication, auditing support, a new landing page showing more metrics and health info, I18N support, and REST API documentation with Swagger API.

  • The extracted Placement service, used to track cloud resource inventories and usages to help other services effectively manage and allocate their resources, is now packaged as part of RDO. Placement has added the ability to target a candidate resource provider, easing specifying a host for workload migration, increased API performance by 50% for common scheduling operations, and simplified the code by removing unneeded complexity, easing future maintenance.

Other improvements include:

  • The TripleO deployment service, used to develop and maintain tooling and infrastructure able to deploy OpenStack in production, using OpenStack itself wherever possible, added support for podman and buildah for containers and container images. Open Virtual Network (OVN) is now the default network configuration and TripleO now has improved composable network support for creating L3 routed networks and IPV6 network support.

Contributors

During the Stein cycle, we saw the following new RDO contributors:

  • Sławek Kapłoński
  • Tobias Urdin
  • Lee Yarwood
  • Quique Llorente
  • Arx Cruz
  • Natal Ngétal
  • Sorin Sbarnea
  • Aditya Vaja
  • Panda
  • Spyros Trigazis
  • Cyril Roelandt
  • Pranali Deore
  • Grzegorz Grasza
  • Adam Kimball
  • Brian Rosmaita
  • Miguel Duarte Barroso
  • Gauvain Pocentek
  • Akhila Kishore
  • Martin Mágr
  • Michele Baldessari
  • Chuck Short
  • Gorka Eguileor

Welcome to all of you and Thank You So Much for participating!

But we wouldn’t want to overlook anyone. A super massive Thank You to all 74 contributors who participated in producing this release. This list includes commits to rdo-packages and rdo-infra repositories:

  • yatin
  • Sagi Shnaidman
  • Wes Hayutin
  • Rlandy
  • Javier Peña
  • Alfredo Moralejo
  • Bogdan Dobrelya
  • Sławek Kapłoński
  • Alex Schultz
  • Emilien Macchi
  • Lon
  • Jon Schlueter
  • Luigi Toscano
  • Eric Harney
  • Tobias Urdin
  • Chandan Kumar
  • Nate Johnston
  • Lee Yarwood
  • rabi
  • Quique Llorente
  • Chandan Kumar
  • Luka Peschke
  • Carlos Goncalves
  • Arx Cruz
  • Kashyap Chamarthy
  • Cédric Jeanneret
  • Victoria Martinez de la Cruz
  • Bernard Cafarelli
  • Natal Ngétal
  • hjensas
  • Tristan de Cacqueray
  • Marc Dequènes (Duck)
  • Juan Antonio Osorio Robles
  • Sorin Sbarnea
  • Rafael Folco
  • Nicolas Hicher
  • Michael Turek
  • Matthias Runge
  • Giulio Fidente
  • Juan Badia Payno
  • Zoltan Caplovic
  • agopi
  • marios
  • Ilya Etingof
  • Steve Baker
  • Aditya Vaja
  • Panda
  • Florian Fuchs
  • Martin André
  • Dmitry Tantsur
  • Sylvain Baubeau
  • Jakub Ružička
  • Dan Radez
  • Honza Pokorny
  • Spyros Trigazis
  • Cyril Roelandt
  • Pranali Deore
  • Grzegorz Grasza
  • Bnemec
  • Adam Kimball
  • Haikel Guemar
  • Daniel Mellado
  • Bob Fournier
  • Nmagnezi
  • Brian Rosmaita
  • Ade Lee
  • Miguel Duarte Barroso
  • Alan Bishop
  • Gauvain Pocentek
  • Akhila Kishore
  • Martin Mágr
  • Michele Baldessari
  • Chuck Short
  • Gorka Eguileor

The Next Release Cycle

At the end of one release, focus shifts immediately to the next, Train, which has an estimated GA the week of 14-18 October 2019. The full schedule is available at https://releases.openstack.org/train/schedule.html.

Twice during each release cycle, RDO hosts official Test Days shortly after the first and third milestones; therefore, the upcoming test days are 13-14 June 2019 for Milestone One and 16-20 September 2019 for Milestone Three.

Get Started

There are three ways to get started with RDO.

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

For a production deployment of RDO, use the TripleO Quickstart and you’ll be running a production cloud in short order.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project participates in a Q&A service at https://ask.openstack.org. We also have our users@lists.rdoproject.org for RDO-specific users and operrators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on Freenode IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS mailing lists and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Rain Leander at April 26, 2019 11:03 PM

Ed Leafe

More fun with etcd-compute

Last time I ended my work getting etcd-compute running at the point where I needed to configure the virtual networking. I’ve been busy the past few days with meetings and other work-related stuff, so it’s taken me a while to continue on this experiment. But I have some time now; let’s jump back in! The … Continue reading "More fun with etcd-compute"

by ed at April 26, 2019 03:02 PM

OpenStack Superuser

Airship: 1.0 ready to dock

Airship is getting ready to reach a milestone on its 1.0 voyage at the Open Infrastructure Summit Denver next week. A year after launching, the team is putting the final touches on its first release with a wide range of enhancements to security, resiliency, continuous integration and documentation, as well as upgrades to its platform, deployment and tooling features.

Airship is a collection of loosely coupled but interoperable open source tools that declaratively automate cloud provisioning. (Although the name brings to mind a dirigible, Airship tools are nautically themed.) It serves as a robust delivery mechanism for organizations that want to embrace containers as the new unit of infrastructure delivery at scale.

Starting from bare metal, Airship manages the full life cycle of infrastructure to deliver a production-grade Kubernetes cluster with Helm-deployed artifacts, including OpenStack-Helm. One workflow handles both initial deployments as well as future site updates. In managing your infrastructure, Airship has several goals:

  • Simplicity: Infrastructure is managed through declarative YAML files and there’s one workflow for both deployments and updates. Airship doesn’t require operators to develop their own set of complex orchestration tooling to automate Airship.
  • Flexibility: Containers and Helm charts are the basic unit of deployment for all software including Airship itself, pushing software orchestration logic to the edge. Expanding the software stack is as simple as adding new charts to Airship declarations.
  • Repeatability: Platform state including all versions are specified declaratively, and Airship, Helm, and Kubernetes align containers, dependencies, and configuration in the same way every time.
  • Resiliency: All jobs and services are run as containers, provide health status, and are healed by Kubernetes supervision – taking full advantage of native Kubernetes resiliency.
  • Self-hosting: The Airship components themselves are deployed as Helm charts and run as services within Kubernetes. This allows them to be upgraded like any other software component in the system.

If you want to get started with Airship, there are several great ways to get moving:

  • If you’re at the Open Infrastructure Summit next week, check out these sessions including users like AT&T running Airship in production as well as opportunities to join contributor onboarding.
  • The Airship in a Bottle project gives you a complete all-in-one testing and development environment to give you basic, hands-on experience.
  • When you’re ready to deploy to your data center, the Treasuremap outlines the configuration of a reference architecture. With nightly integration testing against the configuration guidelines, you can be certain that Treasuremap gives you an accurate view of how to configure and deploy Airship.

Airship has an active and welcoming community, with weekly development, design, and program meetings open to all. For immediate conversation with Airship developers and deployers, stop by the #airshipit IRC channel on Freenode, and keep up to date on the latest news with the Airship mailing list. The team will also be at the Open Infrastructure Summit in Denver during the week of April 29. There are over 10 sessions planned including:

  • Lessons Learned running open infrastructure on bare metal Kubernetes clusters in production
  • Bare metal provisioning in Airship, or “Ironic: it’s not just for OpenStack anymore”
  • Securing your cluster network using Calico and OpenStack Helm Infra
  • An Airship project update
  • Airship project onboarding

We’re looking forward to sharing this exciting new release with you. Keep your eyes on the horizon in the coming week for more news, live demos and presentations from the Airship community. Find out more at the airshipit.org and join us at the Open Infrastructure Summit!

The post Airship: 1.0 ready to dock appeared first on Superuser.

by Chris Hoge at April 26, 2019 02:07 PM

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

OpenStack Foundation project news

OpenStack

OpenStack Stein was recently released but the community is already onboard for the next release, Train, scheduled to arrive in October. At next week’s Summit in Denver, you can join open design discussions at the Forum during the Open Infrastructure Summit. Etherpads will be used for collaborative note-taking during those sessions. Then various contributor teams will take advantage of the Projects Team Gathering event to meet and discuss how to organize the upcoming work.

The process to select community goals for the Train release is almost complete: check out the two proposed cross-team goals (PDF docs generation support and IPv6 testing in the gate) and learn about the proponents driving them.

StarlingX

StarlingX was one of the projects with the second largest presence at the ninth Open Source Hackathon that took place in Shenzhen, China, April 18 – 20. During the hackathon, the community worked on bug fixes and discussed new features for the projects. You can check out their progress on the event Etherpad.

Zuul

Zuul 3.8.0 has been released and includes an important security update. Users should upgrade to this version. More information can be found on the release announcement.

Meet the Zuul community at the Open Infrastructure Summit, April 29 – May 1 in Denver, Colorado. Topics include a project update and opportunities to hear from users of Zuul across a variety of communities, technologies and industries (Airship, Finance, Kubernetes, OpenLab, SR-IOV).

Kata Containers

Kata Containers testing and packaging is powered by a diverse ecosystem of infrastructure donors including AWS, Google Compute Engine, IBM, Microsoft Azure, openSUSE Open Build Service, PackageCloud, Packet and Vexxhost, an OpenStack-powered public cloud. The Kata project runs continuous integration (CI) in the cloud for a few reasons. First, maintaining its own infrastructure was not a viable option for an open and distributed development team. Secondly the community wanted to make sure that Kata Containers runs properly on different clouds, so the CI is run using the services of multiple cloud service providers. Learn more about how Kata manages testing in this post.

Next week the Kata community is excited to gather at the Open Infrastructure Summit and PTG in Denver to discuss container security and collaborate with other projects. See the full lineup of Kata talks here.

Open infrastructure community events

Hope to see you in at the Open Infra Summit in Denver. Afterwards, here’s where you can catch up with the global community:

Questions / feedback / contribute

This newsletter is edited by the OpenStack Foundation staff to highlight open infrastructure communities. We want to hear from you!
If you have feedback, news or stories that you want to share, reach us through community@openstack.org . To receive the newsletter, sign up here.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by OpenStack Foundation at April 26, 2019 12:06 PM

April 25, 2019

Aptira

Software Interlude. Part 6 – Development Paradigms

Aptira Software Interlude - Development Paradigms

In our last post, we discussed why software development (and managing software development projects) is so hard. In this Part 6 of the Open Networking Software Interlude, we look at the generic paradigms for building things and use that to compare those of the different areas we encounter in Open Networking. 

Open Networking integrates together many practices, technologies, occupations and even organisational units which have previously been quite distinct and separate. Bringing these together highlights some intrinsic differences in approach, and understanding these differences enables us to manage them 

Preliminary – the three elements of building things 

Humans have a long history of “building things”, be they civil engineering projects (e.g. Suez and Panama Canals), religious edifices (e.g. the Pyramids of Giza or Christian Cathedrals) or new industrial products (e.g. the Model T Ford), or even relatively commonplace modern projects such as multi-story skyscrapers or the development of new software products. 

Throughout this history, there have been many different approaches to how these new things got built. One differentiator is the “stuff” from which these new things were built: stone or steel or concrete or software. 

Another differentiator is the arrangement and relationship between three key elements of each project: “design”, “production” and “management”. 

  • Design: the analysis of needs and the specification of both a thing to be built and (where appropriate) the processes and mechanisms for building it; 
  • Production: the processes and mechanisms for building the new thing (think: a manufacturing line in a factory); and 
  • Management: the control of the end to end process, procurement and assignment of resources etc (Henry Fayol and successors) 

It is beyond the scope of this post to go into this in great detail, but in summary the historical evolution of development has resulted in the progressive separation of “design”, “production” and “management”. 

This ranged from prehistorical times, when all three were essentially merged into one entity, typically including a king or head of state (often equated with a “god in human form”), to modern industrial processes, in which all three are progressively disaggregated and separated along multiple dimensions (professional, organisational and legal/administrative). 

This progression continued steadily until the invention of software development. 

Early approaches to software were based on non-software industrial and engineering processes in which these three elements were quite distinct and separated. This led to the rules-based and serialised implementation strategies (e.g. the Waterfall) which were efficient from a management perspective and well-understood based on established practice. 

But we all know how troublesome that has been (and remains) in software development projects. 

This has changed in the last 20 years or so. Whilst agile has introduced many things, one of the fundamental paradigm shifts in agile software development has been to re-integrate “design” and “production” into the same process, organisational and management structures. 

“Theory Building” and Design 

From the idea that software development is largely about building a theory about a problem and the solution, it isn’t too far a step to generalise this very technical and specific activity into a very common one: design. 

The term “design has many meanings but at its core it is a problem-solving exercise intended to create a solution that meets many requirements. 

By aligning the idea of “theory building” in programming to the idea of design generally, we can take a perspective on software and software development that compares and aligns it with other productive activities and the fundamental breakdown into “design” and “production”. 

In software development, “design” and “production” are integrated into the one process: the more tightly integrated are, in practice, the better the software development process works. And the more separated they become, on balance, the less successful the process is. 

Contrast this with an industrial process, for example the mass market products domain. Here, a product is designed and then a factory manufactures many copies of this design: there is very little overlap between the two phases and each phase is performed by different people or groups: even different organisations.  

Creators need an immediate connection to what they're creating." Bret Victor in 'Inventing on Principle

https://jamesclear.com/great-speeches/inventing-on-principle-by-bret-victor

So in this case, “design” is very separate from “production” and there are fairly robust practices to help ensure that this separation doesn’t result in design defects being manufactured and distributed to customers. Even so, this is not a rare occurrence. 

Other engineering works, for example Network Engineering, seems far more related to the separate “design” and “production” paradigm than it does to the combined paradigm. These paradigms can be deeply entrenched, often unconsciously, in the thought processes, tools, and even operating procedures that surround the development of each type of solution, and to a large extent, these paradigms are highly incompatible. 

What does this mean for Open Networking? 

What is the implication for Open Networking? Because of the wide net cast over the components of Open Networking practice, it is inevitable that multiple paradigms become enmeshed in the same projects, working on the same things. In many projects this doesn’t occur, being either mostly software or mostly hardware. But in Open Networks, these incompatibilities are both stark and widely distributed. 

This can lead to significant dissonance that must be recognised and managed carefully. 

We will address this in more detail in a later post. 

Stay tuned…. 

The post Software Interlude. Part 6 – Development Paradigms appeared first on Aptira.

by Adam Russell at April 25, 2019 11:18 AM

April 24, 2019

OpenStack Superuser

Get involved with diversity and inclusion events at the Denver Summit

You’ll get the big picture with keynotes, then dive in with the working group or up to speed over lunch with diversity and inclusion efforts in the community at the Open Infrastructure Summit. But wait, there’s more: This is the second Summit after the pivot from the Women of OpenStack (WOO) to a focus on diversity. Here’s a look at all the events on the agenda for Denver:

Speed Mentoring lunch

If you’re new to open infrastructure or would like some mentoring, this session is a great icebreaker and a way to get know new and experienced people in the open-source community. The Intel-sponsored session will be divided between career, technical and community mentoring. Mentees will be organized into small groups and each group will have several 15 minute mentoring sessions. In your small group, you’ll get to know a bit about a mentor and have an opportunity to ask them a question or two about how you can grow your career, get involved in the community, and make the most of the summit. Then, after 15 minutes, a new mentor will cycle to your group and the process will repeat. If you want to mentor, just indicate that when you RSVP.
Details here.

Diversity Networking lunch

Join the Diversity Working Group for lunch to meet and network with the open source community to discuss best strategies for supporting each other. Sponsored by IBM, the group will celebrate accomplishments over the past year, break into small groups to discuss updating our initiatives and discuss other topics related to diversity and growing our community.
Details here.

“Chasing Grace” screening

Join Jennifer Cloer, the executive producer and director of the “Chasing Grace” project for a private screening of episode two, “Progress and the Power of Community.”This episode will explore where progress happens – in the individual stories of women who have started companies, nonprofit organizations and women-in-tech groups. The challenges they face when they do stand up, speak out or challenge the status quo will also be explored to understand where we’re failing so that we may accelerate progress. Details here.

Diversity and Inclusion Survey: Results

The OpenStack Diversity and Inclusion WG conducted a six-month long survey to learn more about the members of our community. The results have been anonymously compiled by members of the CHAOSS project, a Linux Foundation project focused on creating analytics and metrics to help define community health. This session will share those results.
Details here.

The Diversity and Inclusion Working Group is meeting up during the event; you can check out the events and biweekly meeting schedule for the Diversity and Inclusion WG here.

 

The post Get involved with diversity and inclusion events at the Denver Summit appeared first on Superuser.

by Superuser at April 24, 2019 10:44 PM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
May 25, 2019 03:36 AM
All times are UTC.

Powered by:
Planet