October 10, 2019

Aptira

10% off + FREE Consulting – FINAL DAYS!

Aptira 10 year birthday 10% off sale

Final days to claim our Birthday Special!

Incase you missed it, on the 9th of the 9th 2019, we turned 10! So until the 10th of the 10th, we’re offering 10% off all our services. That’s 10% off managed services, 10% off training, 10% off everything except hardware. This 10% discount also applies to pre-paid services, so you can pre-pay for the next 12 months to really maximise your savings!

We’re also offering a free 2 hour consulting session to help get you started with transforming your Cloud solution.

This offer is ending soon, so chat with a Solutionaut today to take advantage of this once in a decade discount and let us turn your business capabilities into a competitive advantage.

Let us make your job easier.
Find out how Aptira's managed services can work for you.

Find Out Here

The post 10% off + FREE Consulting – FINAL DAYS! appeared first on Aptira.

by Jessica Field at October 10, 2019 12:59 PM

OpenStack Superuser

From Containers to Edge Computing, Research Organizations Rely on Open Infrastructure

A lot separates Milan and Rome—including three hours by train—but one thing that connects these two cities is the open infrastructure community. 

The Italian community organizers—Binario Etico and Irideos— made two big changes to the local event this year. First, they renamed the OpenStack Day to OpenInfra Days to broaden the scope of the content at the event. They also planned two events this year in order to put the latest trends and user stories in front of as many local community members as possible. The events would not have been possible without the support of the event sponsors: D2iQ, GCI, Linux Professional Institute, OpenStack Foundation, and Mellanox. 

A combined crowd of over 300 attendees gathered in Milan and Rome last week at the OpenInfra Days Italy to hear how organizations are building and operating open infrastructure. 

Mariano Cunietti and Davide Lamanna kicked off both events explaining how important it is for European organizations to embrace open source components and cross community collaboration.

“It’s the way we collaborate and the way we shape communication flow that works,” Cunietti said. “Collaborative open source is a way to shift technicians from being consumers to participants and citizens of the community. This is a very important shift.” 

From a regional perspective, Lamanna explained how European standards and privacy laws create a requirement that have given local, open source organizations a competitive advantage around interoperability and flexibility features.   

To exemplify the power of open infrastructure and community collaboration in Europe, several users shared their production stories. An industry that is very pervasive in Europe—particularly Italy—is research. 

  • GARR: Saying that no infrastructure is open until you open it, GARR harmonizes and implements infrastructure for the benefit of the scientific community in Italy—amassing to around 4 million users.  Alex Barchiesi shared some stats around GARR’s current OpenStack deployment—8,500 cores with 10 PB of storage in five data centers across three regions—as well as their approach to identity federation. GARR’s concept of federation: the simpler, the better; the less requirements, the more inclusive. With their multi-region, multi-domain model, Barchiesi explained how they have architected a shared identity service. To give back to the community, the GARR team contributes upstream to OpenStack Horizon, k8s-keystone auth, and juju charms. 
  • The Istituto Nazionale di Fisica Nucleare (INFN)—an Italian public research institute for high energy physics (who also collaborates with CERN!)—has a private cloud infrastructure that is OpenStack-based and geographically distributed in three major INFN data centers in Italy. The adoption of Ceph as distributed object storage solution enables INFN to provide both local block storage in each of the interested sites and a ready-to-use disaster recovery solution implemented among the same sites. Collectively, the main data centers have around 50,000 CPU cores, 50 PB of enterprise-level disk space, and 60 PB of tape storage.  
  • While CERN is not based in Italy, their OpenStack and Kubernetes use case provides learnings around the world. Jan van Eldik shared updated stats around CERN’s open infrastructure environment with focuses on OpenStack Magnum, Ironic and Kubernetes. CERN by the numbers: more than 300,000 OpenStack cores, 500 Kubernetes clusters, and 3,300 servers managed by OpenStack Ironic (expected to be 15,000 in the next year). 

Outside of the research sector, other users who shared their open infrastructure story include the city government of Rome’s OpenStack use case, Sky Italia’s creation of a Kubernetes blueprint and network setup that empowers their brand new Sky Q Fibra service, and the SmartME project that is deploying OpenStack at the edge for smart city projects in four cities across Italy. 

What’s next for the open infrastructure community in Italy? Stay tuned on the OpenStack community events page for deadlines and event dates. 

Can’t wait until 2020? Join the global open infrastructure community at the Open Infrastructure Summit Shanghai from November 4-6.

Cover photo courtesy of Frederico Minzoni.

The post From Containers to Edge Computing, Research Organizations Rely on Open Infrastructure appeared first on Superuser.

by Allison Price at October 10, 2019 12:00 PM

October 09, 2019

Mirantis

SUSE OpenStack is no more — but Don’t Panic

SUSE has announced they're discontinuing their OpenStack distro, but it's not the end of the line for their customers.

by Nick Chase at October 09, 2019 08:27 PM

Aptira

Open Source Networking Days Australia

Open Source Networking Days Australia

Coming Soon: Open Source Networking Days Australia

Open Source Networking Day Australia is a one-day mini-summit hosted by Telstra and co-organized by LF Networking (LFN) and Aptira.

This is the first time that LFN has brought an open source networking event to Australia and it will be a unique opportunity to connect and collaborate with like-minded community members that are passionate about open source networking. The event will bring together service providers, the developer community, industry partners and academia for a day of collaboration and idea exchange on all things related to open-source networking, including LF Networking (LFN) projects like ONAP, OpenDaylight, Tungsten Fabric and Open Networking Foundation (ONF) projects like COMAC, Stratum, ONOS and P4, as well as home-grown innovation such as OpenKilda and many more.

To make open source networking viable in Australia, we need to collectively grow awareness, skills and investment. By attending this event, attendees will learn about the state of open source networking adoption globally and locally, how open source is applied in network automation, evolution of software defined networking and how open source enables exciting use cases in edge computing. Attendees will have plenty of opportunities to interact with global experts, industry peers and developers via keynote sessions, panel Q&A, technical deep-dives and business discussions, and more importantly learn how to get involved in open source networking communities going forward. Registration is free, so register today and we hope to see you in Melbourne!

Melbourne, Australia | November 11, 2019
8:30 am – 5:00 pm
Telstra Customer Insight Centre (CIC)
Tickets

In addition to this, there will also be a Next-Gen SDN Tutorial hosted on the 12th of November.

Next-Gen SDN is delivering fine-grained network programmability with zero touch configuration and management, enabling operators’ complete control of their networks. Leveraging P4, P4Runtime, OpenConfig/gNMI and gNOI, NG-SDN is now truly delivering on the ‘software defined’ promise of SDN for future transformation, new applications and unprecedented levels of new value creation.

This tutorial is an opportunity for architects and engineers to learn the basics and to practically experiment with some of the building blocks of the NG-SDN architecture, such as:

  • P4 language
  • Stratum (P4Runtime, OpenConfig over gNMI, gNOI)
  • ONOS

The goal of the tutorial is to answer questions such as:

  • What is P4 and how do I use it?
  • How do I go from a P4 program to a complete network solution?
  • What is Stratum and how can I use its interfaces to control packet forwarding, configure ports, or push software upgrades to my network devices?
  • How can I use ONOS to write control-plane apps for my P4 program?

It is organized around a sequence of introductory presentations, as well as hands-on exercises that show how to build a leaf-spine data center fabric from scratch based on IPv6 using P4 and ONOS.

The tutorial will include an introduction to the P4 language, Stratum, and ONOS. Participants will be provided with starter P4 code and an ONOS app implementation, along with instructions to run a Mininet-emulated leaf-spine topology of Stratum-enabled software switches. Only basic programming and networking knowledge is required to complete the hands-on exercises. Knowledge of Java and Python will be helpful to understand some of the starter code.

Registrations for the tutorial are limited to 50 people, so to secure your place register now.

Open Source Networking Days Australia Sponsors

Ready to move your network into the software defined future?
Automate your network with ONAP.

Find Out How

The post Open Source Networking Days Australia appeared first on Aptira.

by Jessica Field at October 09, 2019 12:12 PM

October 08, 2019

OpenStack Superuser

Open Infrastructure in Germany: Hitting the Road with New and Growing OpenStack Use Cases

A year after we held the OpenStack Summit Berlin, it was great to return to Berlin to see what has changed—hear how OpenStack users had grown their deployments since we last saw them, finding new users sharing their stories, and hearing how companies are integrating open infrastructure projects in innovative ways.

Europe’s first music hotel with photos on the wall of the musicians who have visited in years past welcomed a new audience: 300 Stackers for the 2019 OpenStack Day DOST in Berlin. Community members gathered for two days of breakout sessions, sponsor demos, and waterfront views. Sessions and an evening event cruise along the Spree River were made possible by event organizers and sponsors: B1 Systems, Canonical, Netways Web Services, Noris Network, the OpenStack Foundation, Open Telekom Cloud, Rancher, and SUSE.

In addition to being home to a diverse set of ecosystem vendors, German roads are also home to automakers who rely on OpenStack including Audi and BMW who shared their use cases with conference attendees.

BMW first shared its OpenStack story at the Paris Summit in 2014 and since then, has continued to grow its OpenStack footprint rapidly. Currently sitting at 700 servers, they are expecting their environment to grow by an additional 300 by the end of the year. As of today, almost 400 projects and platforms (rising steadily) rely on their dynamic, flexible and tailor-made instance of the OpenStack at the BMW Group, including autonomous driving.

Andreas Poëschl showing how BMW has grown its OpenStack environment over the years.

Audi was the second automaker of the conference to share its open infrastructure use case, powered by OpenStack and Ceph. Audi AG’s shop floor IT environment is designed for uninterrupted, highly available 24/7 operation, and these requirements make it difficult to test new, not yet evaluated technologies close to production. To quickly bring these technologies into production and make them available, the Audi Production Lab was founded. There, it is possible to incorporate the latest concepts and develop them to the point where they meet the requirements of production.

Through the construction of a self-sufficient, decoupled, independently usable, flexible, and adaptable server infrastructure based on Ceph and OpenStack in the Production Lab, it is now possible to evaluate innovative technologies such as Kubernetes and bring them to production in a timely manner.

Auto makers were not the only ones sharing their open infrastructure integration story.

  • SAP shared its Converged Cloud where the basis is OpenStack orchestrated in a Kubernetes cluster. With the newly developed Kubernikus module, the Converged Cloud enables SAP to offer its customers Kubernetes-as-a-Service, which is provided as a one-button self-service. Kubernikus creates a Kubernetes cluster that operates as a managed service and can be offered for API support. Kubernikus works with the OpenStack API and remains 100% Kubernetes and Open Source. The structure allows the separate operation of Kubernetes API and project-specific nodes.  
  • The Open Telekom Cloud, the public cloud service of Deutsche Telekom, is one of the local members of the OpenStack 100k core club. With over a quarter of a million managed CPU cores, it’s one of the largest fully managed and managed clouds in Europe. Their team presented the DevOps model that enables their OpenStack-powered public cloud to continue to grow. 

What’s next for the open infrastructure community in Germany? The event organizers say the planning for the 2020 event in Hamburg is underway. Stay tuned on the OpenStack community events page for deadlines and event dates. 

Can’t wait until 2020? Join the global open infrastructure community at the Open Infrastructure Summit Shanghai November 4-6. 

Cover photo courtesy of NETWAYS Web Services.

The post Open Infrastructure in Germany: Hitting the Road with New and Growing OpenStack Use Cases appeared first on Superuser.

by Allison Price at October 08, 2019 01:00 PM

October 07, 2019

Mirantis

How to deploy Airship in a Bottle: A quick and dirty guide

Airship in a Bottle is a simple way to create an Airship deployment that includes a compact OpenStack cluster.

by Nick Chase at October 07, 2019 12:55 PM

October 05, 2019

Aptira

Real-World Open Networking. Part 5: Dissonance between Networks and Software Domains

Real-World Open Networking. Part 5: Dissonance between Networks and Software Domains

In our last post we finished up a detailed examination of different aspects of Interoperability. In this post, we will analyse the different mindsets between traditional networking domains and software development domains, explain why there is often built-in dissonance.

Background

Whilst Open Network solutions require the integration of network and software components and practices, at the current time (and historically) these two domains are largely incompatible. Unless carefully managed this incompatibility will cause project stress and impairment.

Given that many (if not most) Open Network solutions originate in the Network Engineering department within a user organisation, this is an important consideration for the entire lifecycle of the solution; especially so if the Network Engineering team does not have established software skills and experience.

The Problem

There are many aspects of the types of dissonance that can be experienced in an Open Networking project due to different paradigms or mindsets. Below we cover the top four aspects of the problem:

  • Design & Production paradigm conflicts
  • Ability to Iterate
  • End user Engagement
  • Expectations of Interoperability

Expectation on Development

We described in Software Interlude Part 6 – Development Paradigms that traditional network engineering aligns more with the production model of work, i.e. that the design and production processes are largely serialised and separate.

Software development on the other hand operates on a different paradigm, in which design and production are largely intermingled: not just parallel but intertwined within the same team and the same resources.

Networks (in general) are designed using discrete components and can be designed and built along fairly pre-determined and predictable steps guided by engineering principles. Networks are highly mechanical and mathematical in nature, following a well-established set of rules. Even the software components of traditional network equipment (configuration) followed rules backed up by years of mathematical research. Network designs can be validated in advance using the same techniques.

Practically, we see the implications of this in the way network projects are executed. Formally, network projects are far more of a plan-based (aka Waterfall) lifecycle model. There are many logical reasons why the plan-based approach is better for this type of project.

Informally, we also see this: it’s typical that a senior, more experienced, person will do the network design and create a specification for how the network is to be built. This network design is typically handed off to other technical personnel for the build.

Expectations on the ability to iterate

Flexibility is a key aspect of software development projects: it underpins everything that a software developer does and thinks. Networks appear to value other things: integrity, security etc. The difference comes down to the relative size of Increments, prototypes and/or MVP’s. Note: the MVP (Minimum Viable Product) is the smallest component that can be deployed to production and which enables at least 1 valuable use case.

Small increments in functionality, prototypes and MVP’s are important parts of the solution development process. These all support the agile principles if inspect and adapt.

For software, these increments can be very small and be produced very rapidly. Traditionally, in the network domain, creating a small instance of some aspect of a solution has a much higher hurdle. Model labs or test environments may exist, but these are typically insufficient for the dynamic changes required by the need to iterate; that is, if they are available at all, and/or have the right or sufficient quantities of hardware.

Expectations on End User Engagement

It is not uncommon for networks projects to be built to very general requirements and not to specific end-user use cases. The logical flow-on from this is that end-users are not actively engaged in the development lifecycle.

Software projects, and in particular Agile software projects, are built on engagement with end-users: the expectation is that end-users will interact with developers on a daily basis. This requires certain skillsets that are well-developed in software engineers (well, to varying degrees), but few Network engineers have this experience.

Expectations of Interoperability

In general, network developers have a much higher expectation on out of the box interoperability than software developers, notwithstanding the softwareisation of the networks.

Experienced software developers typically have a high level of scepticism when it comes to claims of interoperability and will naturally plan in validation process to ensure they understand how the product will actually work. Network engineers and architects appear to be more ready accept claims of operability or standards compliance and don’t necessarily prepare for validation processes, except for first time onboarding of equipment into a network.

But given the different natures of the products, an initial validation for a software product can have a relatively short life (as new updates can break this tested functionality), whereas initial validation of a hardware product has a much longer life.

Conclusion

The existence of these sources of dissonance, and more, can easily lead to project impairment if not anticipated and managed carefully.

In both project planning and execution, problems arise when one party wants to invest time into something (e.g. risk reserves or validation testing) that the other party doesn’t see the need for (and consequently believes is unjustified padding of the estimates) or just doesn’t get leading to misunderstanding and miscommunication.

How do we manage this effectively? We treat everything as a software project.

Let us make your job easier.
Find out how Aptira's managed services can work for you.

Find Out Here

The post Real-World Open Networking. Part 5: Dissonance between Networks and Software Domains appeared first on Aptira.

by Adam Russell at October 05, 2019 01:20 PM

OpenStack Superuser

OpenStack Ironic Bare Metal Program case study: VEXXHOST

The OpenStack Foundation announced in April 2019 that its Ironic software is powering millions of cores of compute all over the world, turning bare metal into automated infrastructure ready for today’s mix of virtualized and containerized workloads.

Some 30 organizations joined for the initial launch of the OpenStack Ironic Bare Metal Program, and Superuser is running a series of case studies to explore how people are using it.

VEXXHOST provides high performance, cloud computing solutions that are cost conscious, complete, and widely flexible. In 2011, VEXXHOST adopted OpenStack software for its infrastructure. Since then, VEXXHOST has been an active contributor and an avid user of OpenStack. Currently, VEXXHOST provides infrastructure-as-a-service OpenStack public cloud, private cloud, and hybrid cloud solutions to customers, from small businesses to enterprises across the world.

Why did you select OpenStack Ironic for your bare metal provisioning in your product?

VEXXHOST has a long history of involvement with OpenStack technology, dating back to the Bexar release. We have since been powering all of our infrastructures using OpenStack. Taking advantage of Ironic for our bare metal provisioning seemed a natural next step in the continuous building out of our system and Ironic fit right in with each of our components, integrating easily with all of our existing OpenStack services.

As we offer multiple architectures, enterprise-grade GPUs, and various hardware options, the actual process of testing software deployments can pose a real challenge when it comes to speed and efficiency. However, we knew that choosing Ironic would resolve these difficulties, with the benefits being passed on to our users, in addition to enabling us to provide them with the option of deploying their private cloud on high-performing bare metal.

What was your solution before implementing Ironic?

Before VEXXHOST implemented OpenStack Ironic, we were using a system that we had built internally. For the most part, this system provided an offering of services that Ironic was already delivering on so it made sense to adopt it as opposed to maintaining our smaller version.

What benefits does Ironic provide your users?

Through Ironic, VEXXHOST’s users have access to fully dedicated and secure physical machines that can live in our data centres or theirs. Due to its physical and dedicated nature, the security provided by bare metal relieves VEXXHOST’s users of any risks associated with environment neighbours and thanks to the isolation factor, users are ensured that their data is never exposed to others. Ironic can also act as an automation tool for the centralized housing and management of all their machines and even enables our users to access certain features that aren’t available in virtual machines, like having multiple levels of virtual machines.

Additionally, VEXXHOST’s users benefit from Ironic’s notably simpler configuration and less complex set-up when compared to virtual machines. Where use cases require it, Ironic can also deliver to our users a higher level of performance than virtual machines. Through the region controller, our users benefit from high availability starting at the data center level and users are able to create and assign physical availability zones to better control critical availability areas. Through the use of Ironic, VEXXHOST can easily run any other OpenStack projects and configure our user’s bare metal specifically for their use cases. Ironic is also easily scaled from a few servers to multiple racks within a data centre and through their distributed gateways, makes it possible to process large parallel deployments. By using OpenStack technology, like Ironic, VEXXHOST ensures that users are never faced with the risks associated with vendor lock-in.

What feedback do you have to provide to the upstream OpenStack Ironic team?

Through our long-standing involvement with the OpenStack Community, based on VEXXHOST’s contributions and our CEO Mohammed Naser‘s role as Ansible PTL and member of the Technical Committee, we regularly connect with the Ironic team and have access to their conversations. Currently, there isn’t any feedback that we haven’t already shared with them.

Learn more

You’ll find an overview of Ironic on the project Wiki.
Discussion of the project takes place in #openstack-ironic on irc.freenode.net. This is a great place to jump in and start your ironic adventure. The channel is very welcoming to new users – no question is a wrong question!

The team also holds one-hour weekly meetings at 1500 UTC on Mondays in the #openstack-ironic room on irc.freenode.netchaired by Julia Kreger (TheJulia) or Dmitry Tantsur (dtantsur).

Stay tuned for more case studies from organizations using Ironic.

Photo // CC BY NC

The post OpenStack Ironic Bare Metal Program case study: VEXXHOST appeared first on Superuser.

by Superuser at October 05, 2019 01:00 PM

October 04, 2019

Chris Dent

Fix Your Debt: Placement Performance Summary

There's a thread on the openstack-discuss mailing list, started in September and then continuing in October, about limiting planned scope for Nova in the Ussuri cycle so that stakeholders' expectations are properly managed. Although Nova gets a vast amount done per cycle there is always some stuff left undone and some people surprised by that. In the midst of the thread, Kashyap points out:

I welcome scope reduction, focusing on fewer features, stability, and bug fixes than "more gadgetries and gongs". Which also means: less frenzy, less split attention, fewer mistakes, more retained concentration, and more serenity. [...] If we end up with bags of "spare time", there's loads of tech-debt items, performance (it's a feature, let's recall) issues, and meaningful clean-ups waiting to be tackled.

Yes, there are.

When Placement was extracted from Nova, one of the agreements the new project team made was to pay greater attention to tech-debt items, performance, and meaningful clean-ups. One of the reasons this was possible was that by being extracted, Placement vastly limited its scope and feature drive. Focused attention is easier and the system is contained enough that unintended consequences from changes are less frequent.

Another reason was that for several months my employer allowed me to devote effectively 100% of my time to upstream work. That meant that there was long term continuity of attention in my work. Minimal feature work combined with maximal attention leads to some good results.

In August I wrote up an analysis of some of that work in Placement Performance Analysis, explaining some of the things that were learned and changed. However that analysis was comparing Placement code from the start of Train to Train in August. I've since repeated some of the measurement, comparing:

  1. Running Placement from the Nova codebase, using the stable/stein branch.
  2. Running Placement from the Placement codebase, using the stable/stein/ branch.
  3. Running Placement from the Placement codebase, using master, which at the moment is the same as what will become stable/train and be released as 2.0.0.

The same database (PostgreSQL) and web server (uwsgi using four processes of ten threads each) is used with each version of the code. The database is pre-populated with 7000 resource providers representing a suite of 1000 compute hosts with a moderately complex nested provider topology that is similar to what might be used for a virtualized network function.

The same query is used, whatever the latest microversion is for that version:

http://ds1:8000/allocation_candidates? \
                resources=DISK_GB:10& \
                required=COMPUTE_VOLUME_MULTI_ATTACH& \
                resources1=VCPU:1,MEMORY_MB:256& \
                required1=CUSTOM_FOO& \
                resources2=FPGA:1& \
                group_policy=none

(This is similar to what is used in the nested-perfload peformance job in the testing gate, modified to work with all available microversions.)

Here are some results, with some discussion after.

10 Serial Requests

Placement in Nova (stein)

Requests per second:    0.06 [#/sec] (mean)
Time per request:       16918.522 [ms] (mean)

Extracted Placement (stein)

Requests per second:    0.34 [#/sec] (mean)
Time per request:       2956.959 [ms] (mean)

Extracted Placement (train)

Requests per second:    1.37 [#/sec] (mean)
Time per request:       730.566 [ms] (mean)

100 Requests, 10 at a time

Placement in Nova (stein)

This one failed. The numbers say:

Requests per second:    0.18 [#/sec] (mean)
Time per request:       56567.575 [ms] (mean)

But of the 100 requests, 76 failed.

Extracted Placement (stein)

Requests per second:    0.41 [#/sec] (mean)
Time per request:       24620.759 [ms] (mean)

Extracted Placement (train)

Requests per second:    2.65 [#/sec] (mean)
Time per request:       3774.854 [ms] (mean)

The improvement between the versions in Stein (16.9s to 2.9s per request) were mostly made through fairly obvious architecture and code improvments found by inspection (or simply knowing it was not ideal when first made, and finally getting around to fixing it). Things like removing the use of oslo versioned objects and changes to cache management to avoid redundant locks.

From Stein to Train (2.9s to .7s per request) the improvements were made by doing detailed profiling and benchmarking and pursuing a very active process of iteration (some of which is described by Placement Performance Analysis).

In both cases this was possible because people (especially me) had the "retained concentration" desired above by Kashyap. As a community OpenStack needs to figure out how it can enshrine and protect that attention and the associated experimentation and consideration for long term health. I was able to do it in part because I was able to get my employer to let me and in part because I overcommitted myself.

Neither of these things are true any more. My employer has called me inside, my upstream time will henceforth drop to "not much". I'm optimistic that we've established a precedent and culture for doing the right things in Placement, but it will be a challenge and I don't think it is there in general for the whole community.

I've written about some of these things before. If the companies making money off OpenStack are primarily focused on features (and being disappointed when they can't get those features into Nova) who will be focused on tech-debt, performance, and meaningful clean-ups? Who will be aware of the systems well enough to effectively and efficiently review all these proposed features? Who will clear up tech-debt enough that the systems are easier to extend without unintended consequences or risks?

Let's hit that Placement performance improvement some more, just to make it clear:

In the tests above, "Placement in Nova (stein)" failed with a concurrency of 10. I wanted to see at what concurrency "Extracted Placement (train)" would fail: At 150 concurrency of 1000 requests some requests fail. At 140 all requests work, albeit slow per request (33s). Based on the error messages seen, the failing at 150 is tied to the sizing and configuration of the web server and nothing to do with the placement code itself. The way to have higher concurrency is to have more or larger web servers.

Remember that the nova version fails at concurrency of 10 with the exact same web server setup. Find the time to fix your debt. It will be worth it.

by Chris Dent at October 04, 2019 01:32 PM

OpenStack Superuser

OpenStack Ironic Bare Metal Program case study: China Mobile

The OpenStack Foundation announced in April 2019 that its Ironic software is powering millions of cores of compute all over the world, turning bare metal into automated infrastructure ready for today’s mix of virtualized and containerized workloads.

Over 30 organizations joined for the initial launch of the OpenStack Ironic Bare Metal Program, and Superuser is running a series of case studies to explore how people are using it.

China Mobile is a leading telecommunications services provider in mainland China. The Group provides full communications services in all 31 provinces, autonomous regions and directly-administered municipalities throughout Mainland China and in Hong Kong Special Administrative Region.

In 2018, the company was again selected as one of “The World’s 2,000 Biggest Public Companies” by Forbes magazine and Fortune Global 500 (100) by Fortune magazine, and recognized for three consecutive years in the global carbon disclosure project CDP’s 2018 Climate A List as the first and only company from Mainland China.

Why did you select OpenStack Ironic for bare metal provisioning in your product?

China Mobile has a large number of businesses running on various types of architectures such as x86 and power servers, which provide high quality services to our business and customers. This large number continues to increase every year by more than 100,000. Recently we have built several cloud solutions based on OpenStack as Gold Members of the OpenStack Foundation. Therefore, our public cloud and private cloud solutions are compatible with OpenStack. Ironic focuses on the compute, storage and network resources that are matched with OpenStack, which is the core requirement of China Mobile’s bare metal cloud.

In addition, China Mobile’s physical IaaS solution has multiple types of vendor hardware and solutions. By OpenStack Ironic’s improved architecture design and rich plug-in commits, we can learn from reliable experiences from the community during building our service process.

What was your solution before implementing Ironic?

Before the promotion of Ironic, the best automation method we used was the PXE + ISO + kickstart to achieve the relevant requirements. Due to its limitations in the network, storage and even operating system compatibility, we would manually manipulate all the processes in it. At the same time, due to the lack of relevant service data at the management level, workflow data could not be recorded well nor transferred in the course of work, reducing the delivery efficiency greatly.

What benefits does Ironic provide your users?

The biggest benefit of Ironic for us or our users is it can increase the efficiency of server delivery. Originally what took a day or even weeks, now takes half an hour to one hour. Based on Ironic, users can choose more operating systems they need, even on arm-linux. The network resources they wanted, such as Virtual Private Cloud (VPC), Load Balancer (LB), Firewall (FW) which can be freely configured through the combination of Ironic and Neutron. The same as Ironic and Cinder, the combination can provide users with Boot From Volume (BFV) and other disk array management or configuration capabilities. In short, through Ironic, we can deliver a complete compute, network and storage server through a top-down process without the operation and maintenance or user-responsible information synchronization and manual configuration.

With Ironic, we built a platform for data center administrators to redefine the access standards of the different hardware they manage. By this, all hardware vendors should comply with the management and data transmission protocol or they should be pushing the plug-in to OpenStack. Then, for administrators, they can focus on management and user serve.

For China Mobile, hardware management or server os delivery sometimes is not enough. We are extending our bare metal cloud to support applications integrated by OpenStack Mistral and Ansible. All in all, we are continuously improving the ecology from Ironic to save our users time.

What feedback do you have to provide to the upstream OpenStack Ironic team?

We hope that Ironic has an operating system agent solution, like qemu guest agent.

Learn more

You’ll find an overview of Ironic on the project Wiki. Discussion of the project takes place in #openstack-ironic on irc.freenode.net. This is a great place to jump in and start your Ironic adventure. The channel is very welcoming to new users – no question is a wrong question!

The team also holds one-hour weekly meetings at 1500 UTC on Mondays in the #openstack-ironic room on irc.freenode.net chaired by Julia Kreger (TheJulia) or Dmitry Tantsur (dtantsur).

Stay tuned for more case studies from organizations using OpenStack Ironic.

 

Photo // CC BY NC

The post OpenStack Ironic Bare Metal Program case study: China Mobile appeared first on Superuser.

by Superuser at October 04, 2019 01:00 PM

Aptira

Real-World Open Networking. Part 4 – Interoperability: Problems with API’s

Real-World Open Networking. Part 4 - Interoperability. Problems with API's

In our last post we looked at different general patterns of standards compliance in Open Network solutions. In this post we drill down another layer to look at interoperability at the Application Program Interface (API) level, which creates issues at a level beyond standards.

Background

As we’ve mentioned previously, network equipment has been focused on interface compatibility and interoperability for many decades and has a history of real interoperability success. Traditional networks exposed communications interfaces and most of the standards for network equipment focus on these interfaces.

But with the advent of network software equivalents to hardware devices, we open up new areas for problems.

Software components may implement the same types of communications interfaces, but also will provide Application Program Interfaces (API’s) for interaction between itself and other software components. These API’s may be the subject of standards, and thus the issues raised in previous article may apply. Or they may be simply proprietary API’s, unique to the vendor.

So we need to take a look at how API’s can support interoperability and also the problems that occur in API implementation that make interoperability more challenging.

API Interoperability

There are a number of levels at which API’s are open and potentially interoperable, or not.

  • Availability of the specification and support by the vendor of third-party implementation (standard or proprietary)
  • Level of compliance with any documentation (standardised or not)
  • Ability of the underlying components to satisfy the exposed API

Previously, we covered the different degrees of compliance and the obstacles that this put in the way of successful Open Network solutions. In this post we’ll elaborate on the other two only.

Availability of the Interface Specification

Open Standards specifications are generally available, but often not freely available. Some organisations restrict specifications to varying levels of membership of their organisation. Sometimes only paid members can access the specifications.

Proprietary interfaces may be available under certain limited conditions or they may not be available at all. Availability is usually higher for de facto standards, because it enables the standards owner to exert some influence over the marketplace. Highly proprietary interfaces often have higher hurdles to obtain access, typically only if an actual customer requests the specification for itself or on behalf of a solution integrator.

Practical Accessibility in a Project

It’s one thing to get access to an API specification document, but its very much another to gain practical accessibility to the information necessary to implement an interface to that API.

An Open Network solution may have hundreds of API’s in its inventory of components, or more. These API’s must be available for use by the solution designers. A typical solution is to publish these API’s in a searchable catalog. This might be ‘open’ in one sense, but not necessarily Interoperable.

Solution integrators must also have access to a support resource to help with issues arising from the implementation (bugs, etc). It is far too common for the API document to be of limited detail, inaccurate, and even out-of-date. The richness of this support resource and the availability of live support specialists will directly translate to implementation productivity.

Ability of the Underlying Components to Satisfy the API

Software has a number of successes at implementing syntactic and representational openness but not semantic openness. Using the REST standard as an example, I can post a correctly formatted and encoded payload to a REST endpoint, but unless the receiving application understands the semantic content then the interface doesn’t work.

And if the underlying components cannot service the request in a common (let alone standard) way, theoretical interoperability becomes difficult and/or constrained.

An NFV example may help.

Consider an NFV Orchestration use case that performs auto-scaling of NFV instances based on some measure of throughput against capacity. Most NFV components make it easy to obtain the required measures of the relevant metric via telemetry.

But it is the range of available metrics and the algorithms used to generate the metrics that introduces complexity and potentially impacts Interoperability.

One NFV vendor might provide this measure in terms of CPU utilisation at a total NFV level. Another might provide the CPU utilisation at a VM level. Or vendors may use different algorithms for calculating the metric that they call “CPU Utilisation” or may vary considerably in the timing of updates. Another vendor might not provide CPU utilisation at all but may provide a metric of packets per second.

Conclusion

API’s play a significant role in the implementation of Open Network solutions and the achievement of interoperability. However, they are not a “silver bullet” and there can be many challenges. As with Standards compliance, API availability, and potentially compliance with a standard, cannot be assumed.

In the last few posts we’ve focused on software-related topics, but it’s time to bring back the Networking side of Open Networking for our last two posts. Leaving technology aside for the moment, how does a solution integrator deal with the different paradigms for solution implementation that can exist in an Open Networking project? We’ll cover that in the next post.

Stay tuned.

Ready to move your network into the software defined future?
Automate your network with ONAP.

Find Out How

The post Real-World Open Networking. Part 4 – Interoperability: Problems with API’s appeared first on Aptira.

by Adam Russell at October 04, 2019 04:41 AM

October 03, 2019

RDO

RDO is ready to ride the wave of CentOS Stream

The announcement and availability of CentOS Stream has the potential to improve RDO’s feedback loop to Red Hat Enterprise Linux (RHEL) development and smooth out transitions between minor and major releases. Let’s take a look at where RDO interacts with the CentOS Project and how this may improve our work and releases.

RDO and the CentOS Project

Because of tight coupling with the operating system, RDO project joined the CentOS SIGs initiative from the beginning. CentOS SIGs are smaller groups within the CentOS Project community focusing on a specific area or software type. RDO was a founding member of the CentOS Cloud SIG that is focusing on cloud infrastructure software stacks and is using the CentOS Community BuildSystem (CBS) to build final releases.

In addition to Cloud SIG OpenStack repositories, during release development RDO Trunk repositories provide packages for new commits in OpenStack projects soon after they are merged upstream. After commits are merged a new package is created and a YUM repository is published in RDO Trunk server, including this new package build and the latest builds for the rest of packages in the same release.This enables packagers to identify packaging issues almost immediately after they are introduced, shortening the feedback loop to the upstream projects.

How CentOS Stream can help

A stable base operating system, on which continuously changing upstream code is built and tested, is a prerequisite. While CentOS Linux did come close to this ideal, there were still occasional changes in the base OS that were breaking OpenStack CI, especially after a minor CentOS Linux release where it was not possible to catch those changes before they were published.

The availability of rolling-release CentOS Stream, announced alongside CentOS Linux 8,  will help enable our developers to provide earlier feedback to the CentOS and RHEL development cycles before breaking changes are published. When breaking changes are necessary, it will help us adjust for them ahead of time.

A major release like CentOS Linux 8 is even more of a challenge, RDO has managed to transition from EL6 to EL7 during the OpenStack Icehouse cycle by doing two distributions in parallel – five years ago, with a much smaller package set than it is now.

For the current OpenStack Train release in development, the RDO project started preparing for the Python 3 transition using Fedora 28, which helped to get this huge migration effort going, at the same time it was only a rough approximation for RHEL 8/CentOS Linux 8 and required complete re-testing on RHEL.

Since CentOS Linux 8 is released very closely to the OpenStack Train release, the RDO project will be able to provide RDO Train initially only on EL7 platform and will add CentOS Linux 8 support in RDO Train soon after.

For the future releases, the RDO project is looking forward to be able to start testing and developing against CentOS Stream updates as they are developed, to provide feedback, and help stabilize the base OS platform for everyone!

About The RDO Project

The RDO project is providing a freely-available, community-supported distribution of OpenStack that runs on Red Hat Enterprise Linux (RHEL) and its derivatives, such as CentOS Linux. RDO also makes the latest OpenStack code available for continuous testing while the release is under development.

In addition to providing a set of software packages, RDO is also a community of users of cloud computing platforms on Red Hat-based operating systems where you can go to get help and compare notes on running OpenStack.

by apevec at October 03, 2019 08:26 PM

Aptira

Real-world Open Networking. Part 3 – Interoperability: Problems with Standards

Real-world Open Networking. Part 3 – Interoperability: Problems with Standards

In our last post we unpacked Interoperability, including Open Standards. Continuing this theme, we will look at how solution developers implement standards compliance and the problems that arise.

Introduction

Mandating that vendors (and internal systems) comply with Open Standards is a strategy used by organisations to drive interoperability. The assumption is that Open Standards compliant components will be interoperable.

In this post we examine the many reasons why that assumption does not always hold in real-world situations. This analysis will be from the Software perspective, since generally network equipment does a better job of component interoperability than software. This post will also cover general aspects of standards compliance in this post, and the specific aspects of API’s in the next post.

Software Implementation & Interoperability based on Standards

Whether the standard is “de jure” or “de facto”, there are three basic approaches to implementing software compliance with the standards:

  • Reference implementation compatible
  • Reference document compatible
  • Architecture pattern or guideline compatible

Reference Implementation Compatible

This approach consists of two parts:

  • The standard is a controlling design input: i.e. compliance overrides other design inputs; and
  • Validation against a “reference implementation” of the standard.

reference implementation” is a software component that is warranted to comply with the standard and is a known reference against which to validate a developed component. This should also include a set of standard test cases that verify compliance and / or highlight issues. 

Vendors often provide the test results as evidence and characterisation of the level of compliance. 

Benefits of this approach

This is the highest level of compliance possible against a standard. Two components that have been validated against the standard will be interoperable at the lowest common level to which they have both passed the test.

Problems with this approach

A reference implementation must exist and be available, however this is not always the case. The reference implementation must be independently developed and certified, often by the standards body themselves.

Reference Document Compatible

This approach is similar to the Reference Implementation approach. Firstly, the documented standard is a controlling design input. However the second part (validation against the standard) is both optional and highly variable. At the most basic level, compliance could be just that the vendor asserts component compliance with the standard. Alternatively, compliance may be validated by comparison between the developed component and the documentation, and there are many ways to do this at varying levels of accuracy and cost.

Benefits of this approach

The main benefits of this approach is that the design is driven to the standard, and at this level it is equivalent to the Reference implementation approach.

Problems with this approach

Validation without a reference implementation is highly manual and potentially subject to interpretation. This type of validation is very expensive which creates cost pressure for vendors to only partially validate, especially on repeat version upgrades and enhancements.

Architecture Pattern Compatible

In this case the standard is used as one input to the design, but not as the controlling input. The intent is not compliance but alignment. The product may use the same or similar underlying technologies as defined in the standards (e.g. REST interfaces or the same underlying data representation standards such as XML or JSON. The vendor may adopt a similar component architecture (e.g. microservices) to the standard.

Benefits of this approach

At best, this approach may provide a foundation for future compliance.

Problems with this approach

In general, the vendor is designing their product to be “not incompatible” with the standard, without taking on the cost of full compliance.

Rationale for Vendors to Implement Standards

Standards compliance is expensive to implement, regardless of the approach taken. So each vendor will take on their own approach, based on their own situations and context. A vendor may:

  • Completely ignore the standards issue
  • Deliberately, e.g. a start-up, whose early target customers don’t care.
  • Accidentally, if they are unaware of the standards.
  • Not see a competitive advantage in their marketplace: not so much as to justify the cost of standards implementation.
  • Adopt a customisation approach: in other words, implement standardisation when required.
  • Have full compliance in their roadmap for future implementation and simply want a foundation to build on.

Problems with compliance

There are a wide range of implementations and the results are highly variable. The important thing to remember is that a claim of “standards compliance” can mean many things.

From a starting point of the intent to comply (or at least claim compliance), and using any of the strategies above, a vendor can be non-compliant in many ways:

  • Partial implementation, e.g. a custom solution for one customer that is “productised”;
  • Defects in implementation, including misinterpretation of the standard;
  • Deliberate forking of the standard, including the implementation of superset functionality (“our solution is better than the standard”);
  • The incompatibility of underlying or related components;
  • Compliance with limited subsets of the standard, e.g. the most often used functions;
  • Some vendors may misrepresent compliance based on tenuous connections: e.g. , a vendor might claim compatibility on the basis that their API’s are REST-based and nothing more.

Conclusion

Nothing can be assumed about standards compliance, other than that each vendor’s claims must be validated. The other part of this issue is Application Program Interfaces (API) interoperability. We will cover this in the next post. Stay tuned.

Become more agile.
Get a tailored solution built just for you.

Find Out More

The post Real-world Open Networking. Part 3 – Interoperability: Problems with Standards appeared first on Aptira.

by Adam Russell at October 03, 2019 03:55 AM

October 02, 2019

OpenStack Superuser

Meet the Shanghai Open Infrastructure Superuser Award nominees

Who do you think should win the Superuser Award for the Open Infrastructure Summit Shanghai?

When evaluating the nominees for the Superuser Award, take into account the unique nature of use case(s), as well as integrations and applications of open infrastructure by each particular team. Rate the nominees before October 8 at 11:59 p.m. Pacific Daylight Time.

Check out highlights from the five nominees and click on the links for the full applications:

  • Baidu ABC Cloud Group and Edge Security Team, who integrated Kata Containers into the fundamental platform for the entire Baidu internal and external cloud services, and who built a secured environment upon Kata Containers for the cloud edge scenario, respectively. Their cloud products (including VMs and bare metal servers) cover 11 regions including North and South China, 18 zones and 15 clusters (with over 5000 physical machines per cluster).
  • FortNebula Cloud, a one-man cloud show and true passion project run by Donny Davis, whose primary purpose is to give back something useful to the community, and secondary purpose is to learn how rapid fire workloads can be optimized on OpenStack. FortNebula has been contributing OpenDev CI resources since mid 2019, and currently provides 100 test VM instances which are used to test OpenStack, Zuul, Airship, StarlingX and much more. The current infrastructure sits in a single rack with one controller, two Swift, one Cinder and 9 compute nodes; total cores are 512 and total memory is just north of 1TB.
  • InCloud OpenStack Team, of Inspur, who has used OpenStack to build a mixed cloud environment that currently provides service to over 100,000 users, including over 80 government units in mainland China. Currently, the government cloud has provided 60,000+ virtual machines, 400,000+ vcpu, 30P+ storage for users, and hosts 11,000+ online applications.
  • Information Management Department of Wuxi Metro, Phase II of  the Wuxi Metro Cloud Platform project involved the evolution of IaaS to PaaS on their private cloud platform based on OpenStack. In order to acquire IT resources on demand and improve overall business efficiency, Wuxi Metro adopted Huayun Rail Traffic Cloud Solution, which was featured by high reliability, high efficiency, ease of management and low cost.
  • Rakuten Mobile Network Organization, of Rakuten Inc., Japan, launched a new initiative to enter the mobile market space in Japan last year as the 4th Mobile Network Operator (MNO), with a cloud-based architecture based on OpenStack and Kubernetes. They selected to run their entire cloud infrastructure on commercial, off-the-shelf (COTS) x86 servers, powered by Cisco Virtualized Infrastructure Manager (CVIM), an OpenStack-based NFV platform. The overall plan is to deploy several thousand clouds running vRAN workloads spread across all of Japan to a target 5M mobile phone users. Their current deployment includes 135K cores, with a target of one million cores when complete.

Each community member can rate the nominees once by October 8 at 11:59 p.m. Pacific Daylight Time.

Previous winners include AT&T, City Network, CERN, China Mobile, Comcast, NTT Group, the Tencent TStack Team, and VEXXHOST.

The post Meet the Shanghai Open Infrastructure Superuser Award nominees appeared first on Superuser.

by Superuser at October 02, 2019 06:11 AM

Aptira

Real-world Open Networking. Part 2 – Interoperability: The Holy Grail

Real-world Open Networking. Part 2 – Interoperability: The Holy Grail

In our last post we described the attributes of an Open Network Solution. In this post we unpack what surely is the “Holy Grail” of Open Networks: Interoperability. Interoperability is defined as:

Interoperability is a characteristic of a product or system, whose interfaces are completely understood, to work with other products or systems, at present or in the future, in either implementation or access, without any restrictions

Wikipediahttps://en.wikipedia.org/wiki/Interoperability

Bottom line, interoperability means that I can freely substitute components, i.e. If I have Component X in my Open Networking solution, then in the future I can freely replace it with Component Y.

Today, components vary widely in that ability, across any of the component “form factors” we described in this post. But it’s not just technology components that play into our concept of interoperability.

Interoperability aspects of Open Systems

By definition, to be highly-interoperable, we need to be able to freely substitute the components along each of the dimensions of openness that we described in the here.

  • Open Standards
  • Open API’s
  • Open Partners
  • Open Source
  • Open Operations

Let’s look briefly at each of these in turn.

Open Standards

Probably the most obvious and most-used strategy for driving interoperability has been the definition and/or adoption of standards. This covers both those standards formally established by standards bodies (“de jure” standards) and those established informally by market dominance or pervasive use (“de facto” standards).

The idea of Open Standards does not really mean substituting different Standards for each other (although solution designers should probably consider Standard-switching costs in their design phase). Using Open Standards means selecting one standard for a particular functional area of the solution and driving the level of compliance to a this standard to allow the free switching of components within that functional area of the solution.

This is a complex subject which we are going to review in the next post.

Open API’s

From a software integration perspective, Application Programming Interfaces (API’s) are a fundamental way in which interoperability can be achieved. There are a number of perspectives to the “Openness” of an API:

  • Ease of accessibility to the specification
  • Access to a test environment or reference implementation
  • Use of common and open interface protocols e.g. REST

API’s and the issues of interoperability is also quite a complex topic, and we’ll explore this in more detail in the next post plus one.

Open Partners

Being able to switch partners freely is key to open systems, but there are several factors to consider:

  • Contractual relationships: It makes sense to set up a firm relationship as a baseline but also to be able to modify or terminate the contract if circumstances require. Also relevant is the ability to partner in novel and creative ways, for example joint ventures and reseller arrangements.
  • Technology transparency: A vendor does not use or rely on proprietary components that cannot be transferred to other partners or used in-house by the solution owner.
  • Win-Win relationships: Open Partners depends on establishing and promoting a win-win relationship between suppler and customer, rather than one benefiting at the expense of the other.

Unfortunately, there have been many instances of vendors attempting (and succeeding) in locking themselves into customers long-term. This form of “rent seeking” generates ill-will and gives rise to many protective measures, some of them extreme and working against the idea of “Win-Win”.

Open Source

We discussed Open Source in previously, but we didn’t examine this from an interoperability perspective. Open source addresses the technology transparency requirement of Open Partners- it is accessible to all and is therefore transparent. Although effort may be required to ramp up knowledge of a new vendor, Open Source components often have multiple sources of support, knowledge and resources that can assist in this process.

Open Operations

As mentioned in the last post, Open Operations means that operational processes are flexible and open to change, open to engagement, and transparent. In short, this summarises DevOps.

We reviewed the DevOps paradigm in an earlier article. Practices are open when new vendors can slot in without friction or significant overhead. There is a clean flow-through from business to development to operations that enables new vendors to rapidly pick up the pace of value-add. The use of common concepts, tools, and practice enables new vendors to understand where they fit in very quickly.

Conclusion

We can see from the above that all the attributes of Open solutions drive Interoperability. Most business cases and project plans contain risk assessments and sometimes financial provisions for dealing with the costs of change that may occur during a project: a failure of a partner, or a piece of technology or some other aspect of the solution will incur costs and delays as the new components are selected, validated and integrated into the solution.

The only real test of interoperability is that these risk events, if they occur, are orders of magnitude less difficult and costly, such that the risk provisions can be reduced or eliminated. We know what to aim for, but alas we are far from that stage. In this post we have covered some of the issues that cause this failed assumption. We’ll cover more in our next post.

Stay tuned.

Become more agile.
Get a tailored solution built just for you.

Find Out More

The post Real-world Open Networking. Part 2 – Interoperability: The Holy Grail appeared first on Aptira.

by Adam Russell at October 02, 2019 05:36 AM

Trinh Nguyen

"Searchlight for U" at the Korea&Vietnam OpenInfra User Group meetup

Last night, in a cozy conference room in Seoul, South Korea, I had had a very friendly meetup with the OpenStack Korea User Group with around ten or so people. I and Sa Pham, the Vietnam OpenInfra User Group representatives, were there to share our experiences on OpenStack and networking with others. This is not my first time with the Korea User Group but meeting people working on open source projects or want to learn about OpenInfra technologies made me super excited.

Like last time, I had a brief presentation about OpenStack Searchlight showing folks what was going on and my plan for the Ussuri development cycle. And, that is why the title of my talk is "Searchlight for U".


Even though in Train, I had not put much effort into Searchlight but while presenting people the progress, I was amazed how far we have gone. I had been Searchlight's PTL for two cycles and now one more time. Hopefully, I could move the project forward with some real-world adaptation, use cases, and especially being able to attract more contributors.


We only had three presentations in total and it was fast because we would want to spend more time on networking and knowing each other. In the end, I and the Korea User Group organizers discussed our plan for the next OpenInfra study sessions. We then said goodbye and promised to have this kind of event more frequently.

I really had a great time yesterday.

by Trinh Nguyen (noreply@blogger.com) at October 02, 2019 02:08 AM

October 01, 2019

Mirantis

Democratizing Connectivity with a Containerized Network Function Running on a K8s-Based Edge Platform — Q&A

The Facebook-initiated Magma project makes it possible to run Containerized Network Functions in an Edge Cloud environment, thus opening up a whole range of capabilities for providers that may otherwise be limited in what they can provide.

by Nick Chase at October 01, 2019 09:50 PM

OpenStack Superuser

Shanghai Superuser Award Nominee – FortNebula

It’s time for the community to help determine the winner of the Open Infrastructure Summit Shanghai Superuser Awards. The Superuser Editorial Advisory Board will review the nominees and determine the finalists and overall winner after the community has had a chance to review and rate nominees.

Now, it’s your turn.

FortNebula Cloud is one of five nominees for the Superuser Awards. Review the nomination criteria below, check out the other nominees and rate the nominees before the deadline October 8 at 11:59 p.m. Pacific Daylight Time.

Rate them here!

Who is the nominee?

FortNebula Cloud – Donny Davis

FortNebula Cloud was nominated by a community member, so we reached out to Davis to provide some extra context. See both the nomination, and Davis’ responses below.

How has open infrastructure transformed the organization’s business?

The FortNebula cloud is Davis’ mad scientist garage cloud. As such I’m not sure we can speak to how it has transformed culture or business, but we can be amazed at how one person is able to do so much with few resources. Davis does this largely on his own time at home, and is able to provide a good chunk of OpenDev’s CI resources.

Davis:

I don’t have a business or make any money off this cloud. This is completely privately funded. This project’s primary purpose is to give something useful back to the community, and its secondary purpose is to learn how rapid fire workloads can be optimized on OpenStack. I am a one man show that does this purely in my off-time… because building clouds that do real things is fun.

How has the organization participated in or contributed to an open source project?

FortNebula has been contributing OpenDev CI resources since about the end of July 2019. We currently get 100 test VM instances from FortNebula cloud which are used to test OpenStack, Zuul, Airship, StarlingX and much more.

What open source technologies does the organization use in its open infrastructure environment?

FortNebula cloud runs OpenStack deployed with TripleO. Other technologies that are used include gnocchi and grafana.

Davis:

Fortnebula uses Open Source technologies for every single component if possible. The current inventory of software is ansible, puppet, Openstack, CentOS, Ubuntu, freebsd, and pfSense.

What is the scale of your open infrastructure environment?

There are currently 9 OpenStack compute nodes. Unfortunately, I do no run the cloud so do not have detailed numbers for things like cores/memory. The information I do have can be found at https://grafana.fortnebula.com/d/9MMqh8HWk/openstack-utilization

Davis:

The current infrastructure sits in a single rack with one controller, two swift, one cinder and 9 compute nodes. Total cores are 512 and total memory is just north of 1TB.

What kind of operational challenges have you overcome during your experience with open infrastructure?

As part of the onboarding with OpenDev the FortNebula cloud has had to be refactored a couple times to better meet the demands of a CI environment. Nodepool may request many instances all at once and that has to be handled. Test node disk IO throughput was too slow in the initial build out which led to replacing storage with faster devices and centralizing instance root disk hosting.

Davis:

Well the first challenge to meet was IO, as my ceph storage on spinning disks did not perform well for the workload. I moved all the compute nodes to local storage built on NVME. The second was network performance. My network is unique in that I use BGP to each OpenStack tenant, and my edge router advertises a whole subnet for each tenant, which then uses a tunnel broker to provide direct IPv6 connectivity. Through working directly with the infra community we were able to use OpenStack itself to optimize the traffic flows so the infrastructure could keep up with the workloads.

How is this team innovating with open infrastructure?

The FortNebula cloud is showing that you can build an effective OpenStack cloud with a small number of institutional resources as well as human resources. In addition to that, FortNebula is a predominantly IPv6 first cloud. We are able to give every test instance a public IP address by embracing IPv6.

Davis:

FortNebula is a demonstration of what one person and one rack of old equipment can do. If some guy in his basement can build a CI grade cloud, just imagine what a team of dedicated people and real funding could do for a business. It’s also an example to show that open infra is not that hard, even for someone to do in their off time.

Each community member can rate the nominees once by October 8 at 11:59 p.m. Pacific Daylight Time.

The post Shanghai Superuser Award Nominee – FortNebula appeared first on Superuser.

by Superuser at October 01, 2019 08:13 PM

Shanghai Superuser Award Nominee: Baidu ABC Cloud Group & Security Edge teams

It’s time for the community to help determine the winner of the Open Infrastructure Summit Shanghai Superuser Awards. The Superuser Editorial Advisory Board will review the nominees and determine the finalists and overall winner after the community has had a chance to review and rate nominees.

Now, it’s your turn. 

The Baidu ABC Cloud Group and Security Edge team is one of five nominees for the Superuser Awards. Review the nomination criteria below, check out the other nominees and rate the nominees before the deadline October 8 at 11:59 p.m. Pacific Daylight Time.

 Rate them here!

Who is the nominee?

Baidu (Nasdaq: BIDU), a dominant Chinese search engine operator and the largest Chinese website in the world and a global leading AI company, has over 800,000 clients, more than 30,000 employees, and nearly 15,000 patents. In 2018, the company reported an annual revenue of $14 billion.

Application units: Baidu ABC (AI, Big data, and Cloud computing) Cloud Group who integrated Kata Containers into fundamental platform for entire Baidu internal and external cloud services, and Baidu Security Edge team who built a secured environment upon Kata Containers for cloud edge scenario.

Members: Xie Guangjun, Zhang Yu,He Fangshi, Wang Hui, Shen Jiale,Ni Xun, Hang Ding,Bai Yu, Zhou Yueqian,Wu Qiucai

How has open infrastructure transformed the organization’s business?

In 2019, our Kata Containers based products are enjoying market success in areas of FaaS (Function as a Service), CaaS (Container as a Service) and edge computing. Baidu’s cloud function computing service (CFC) based on Kata Containers provided computing power for nearly 20,000 skills of over 3,000 developers to run cloud function computing for Baidu DuerOS (a conversational AI operating system with a “100 million-scale” installation base). Baidu Container Instance service (BCI) has built a multitenant-oriented serverless data processing platform for the internal big data business of Baidu’s big data division. The Baidu Edge Computing (BEC) node is open to all clients while keeping them separated from each other for security and ensuring high performance.

How has the organization participated in or contributed to an open source project?

Baidu is very actively involved in collaboration across open source communities. They are a golden member of CNCF foundation, Premier Member of LF AI Foundation, Hyperledger Foundation and LF Edge foundation and silver member of Apache Foundation.

Baidu is maintaining more than 100 open source projects on github, including Apollo, an open source autonomous driving platform, PaddlePaddle, a Deep Learning Framework and more.

For Kata containers, Baidu has more than 16 functional patch modifications, among which 8 patch sets were contributed to the community. We also published white paper of Kata Container production practice as contribution to the community.

What open source technologies does the organization use in its open infrastructure environment?

Baidu has used Kata Containers to provide high performance and protect data security and the confidentiality of algorithms in different cloud computing scenarios. By using the OpenStack control plane, we seamlessly integrated Kata Container instances with Baidu cloud storage and network in BCI products.

Other technologies including QEMU-KVM/OpenStack (including nova, cinder, glance and neutron), Open vSwich and Kubernetes are used in Kata Containers-based products. Open source device mapper (Linux kernel) and qcow2 format are used in the storage performance optimization, and DPDK is applied for network performance optimization.

What is the scale of your open infrastructure environment?

Baidu has more than 500,000 machines deployed with Linux kernel based on community version.

Baidu Cloud products (including virtual machines and bare metal servers) cover 11 regions including North and South China, 18 zones and 15 clusters (distributed in different regions and zones), which covers tens of thousands of physical machines; One container cluster includes more than 5000 physical machines.

What kind of operational challenges have you overcome during your experience with open infrastructure?

  • Support of mounting user codes dynamically in container – As Kata Containers’ host and guest are not in the same kernel, static mounting before startup is certainly possible, but how to mount dynamically is a challenge.
  • Cold start performance optimization of cloud functions – By optimization, the performance of creation and startup can be as the same as RunC.
  • Function density optimization – The higher the function density is, the more services a physical function can provide, thus lowering the cost.
  • Extensibility of hardware – Baidu’s cloud also provides products and services in big data and AI. Hardware such as GPU for AI requires passthrough to the inside of the container. By using Kata Containers, Baidu achieved extensibility of hardware in AI services.

How is this team innovating with open infrastructure?

  • The Baidu Edge Computing (BEC) product requires a virtual machine with very light creation and release while reserving the similar device model. This product is based on Kata Container, and the bottom layer uses QEMU optimized by Baidu.
  • By integrating with modules such as Nova, Glance and Neutron on OpenStack, Baidu implemented co-location of container instance nodes and virtual machines.
  • Based on virtio-blk/virtio-scsi, Baidu optimized the file system of VM so that its performance is close to that of the host (single queue and single thread).
  • Baidu implemented a network scheme compatible with Neutron and Open vSwitch for network maintenance including network isolation and speed limit and reusing the previous network architecture.

Each community member can rate the nominees once by October 8 at 11:59 p.m. Pacific Daylight Time.

The post Shanghai Superuser Award Nominee: Baidu ABC Cloud Group & Security Edge teams appeared first on Superuser.

by Superuser at October 01, 2019 08:13 PM

Shanghai Superuser Award Nominee: Information Management Department of Wuxi Metro

It’s time for the community to help determine the winner of the Open Infrastructure Summit Shanghai Superuser Awards. The Superuser Editorial Advisory Board will review the nominees and determine the finalists and overall winner after the community has had a chance to review and rate nominees.

Now, it’s your turn.

The Information Management Department of Wuxi Metro is one of five nominees for the Superuser Awards. Review the nomination criteria below, check out the other nominees and rate the nominees before the deadline October 8 at 11:59 p.m. Pacific Daylight Time.

Rate them here!

Who is the nominee?

Information Management Department of Wuxi Metro 

How has open infrastructure transformed the organization’s business?

In 2019, Phase II of Wuxi Metro Cloud Platform was accepted. A private cloud platform based on OpenStack was used in both phases. Phase II of Wuxi Metro Cloud Platform project involved the evolution of IaaS to PaaS for the cloud platform and seamless integration of the cloud platform with the business to ensure safe operation and sustainable business development of Wuxi Metro.

Upon the completion of Phase II of Cloud Platform, Wuxi Metro will focus on its own business, and its resources, network and services are transparent and utilized on demand. With the functions provided at the business function layer and various services provided at the service layer, various business needs such as CI/CD process build, big data development and business application development will be realized quickly.

How has the organization participated in or contributed to an open source project?

In order to acquire IT resources on demand and improve overall business efficiency, Wuxi Metro adopted Huayun Rail Traffic Cloud Solution, which is the cloud computing service customized for customers in the rail transit industry. The solution was featured for high reliability, high efficiency, ease of management and low cost, and can help the rail transit industry customers transform from traditional IT to cloud computing, thus improving the information system service in many aspects and help the business development.

What open source technologies does the organization use in its open infrastructure environment?

Cloud resource management: through the support of multiple hypervisors, the cloud resource management can manage and schedule various virtual resources based on KVM, VMware, Xen and Hyper-v.

Heterogeneous resource management: basic resource platform can incorporate the management of existing physical servers and heterogeneous storage in a unified manner. Physical resources can be used as independent resources or integrated into virtualized resource pool. Storage devices that support OpenStack standard interface will be managed as the storage service centrally.

What is the scale of your open infrastructure environment?

The project integrated resources of nearly 100 servers, and a unified computing resource pool was built to provide nearly 1,500 virtual machines. It integrated multiple sets of storage resources, and a unified storage resource pool to provide a service capacity of nearly 180TB.

What kind of operational challenges have you overcome during your experience with open infrastructure? 

  • Security risk without the disaster recovery
  • Privilege confusion due to too many people involved
  • Unavoidable single point of failure
  • Scattered monitoring without a global view
  • Scattered devices which are difficult to manage
  • Performance decrease because of the data explosion
  • High cost and low utilization

How is this team innovating with open infrastructure?

  • Simplified management – Private cloud platform relies on the virtualization technology to implement effective integration of server resources, and private cloud platform involves specific functions, so the data center management difficulty is reduced greatly.
  • Effective utilization of resources – Perform the inventory evaluation of existing devices to reuse old devices, migrate all appropriate business systems to the cloud, benefit from efficient resource integration brought by the virtualization and cloud computing, utilize existing resources effectively, and reduce the number of devices.
  • High availability ensuring there is no service interruption
  • Fast business go-live
  • Efficient data protection
  • Business resource resilience
  • Improve the system security

Each community member can rate the nominees once by October 8 at 11:59 p.m. Pacific Daylight Time.

The post Shanghai Superuser Award Nominee: Information Management Department of Wuxi Metro appeared first on Superuser.

by Superuser at October 01, 2019 08:13 PM

Shanghai Superuser Award Nominee: InCloud OpenStack Team

It’s time for the community to help determine the winner of the Open Infrastructure Summit Shanghai Superuser Awards. The Superuser Editorial Advisory Board will review the nominees and determine the finalists and overall winner after the community has had a chance to review and rate nominees.

Now, it’s your turn. 

The InCloud OpenStack Team is one of five nominees for the Superuser Awards. Review the nomination below, check out the other nominees and rate the nominees before the deadline October 8 at 11:59 p.m. Pacific Daylight Time.

 Rate them here!

Who is the nominee? 

InCloud OpenStack Team is comprised of over 100 members who developed a OpenStack-based private and hybrid cloud platform, a smart cloud operating system designed for the next generation of cloud data centers and cloud-native applications.

It consists of four sub-teams:

  • Product design: requirement analysis and interaction design
  • Product architecture: solution design and technology research
  • Product development: feature design and implementation
  • Operations support: deployment, troubleshooting.

As a Gold Member of the OpenStack Foundation, Inspur is actively involved in OpenStack community. Team members include Kaiyuan Qi, Zhiyuan Su, Brin Zhang, Guangfeng Su.

How has open infrastructure transformed the organization’s business?

InCloud OpenStack is committed to transforming into a new type of cloud service provider for the government. InCloud OpenStack has ranked first in the government cloud market for five consecutive years. At present, we have provided government cloud services to more than 80 government units in mainland China. We use OpenStack to build a mixed cloud environment for our customers to achieve 100% of traditional and native cloud applications. The government cloud based on OpenStack reduces the online time of customer’s application system from 6 months to less than 1 week, saves 45% of the server input for customers and reduces 55% of the operation and maintenance costs. Currently, InCloud OpenStack has providing cloud services for more than 100,000 cloud users.

How has the organization participated in or contributed to an open source project?

As a Gold Member of the OpenStack Foundation, Inspur is actively involved in OpenStack community and is committed to being the top practitioner of OpenStack, supporting successful deployments in various industries, sharing the optimization and experience of large-scale deployment at the Austin and Denver Summits,  OpenStack China Days, OpenStack China Hacker Loose Activities, and Meet Up technology exchange.. Inspur is an OpenStack Foundation Gold Member, a member of CNCF, Linux Foundation, a core member and founding member of ODCC, OCP, and Open19. Inspur is actively involved in the open source community and is committed to being the top practitioner of OpenStack, commit: over 58 items, and reported over 41 bugs or issues in the OpenStack project.

What open source technologies does the organization use in its open infrastructure environment?

All of Inspur Cloud’s development and CI/CD tools are built using open source technologies, including, but not limited to: Chef, Ansible, Terraform, OpenStack, ELK, Kafka, Docker, Kubernetes, Jenkins, GO, keepalived, ETCD, Grafana, Influxdb, Kibana, Git, OVS, etc.

What is the scale of your open infrastructure environment?

Inspur public cloud and government cloud platforms adopt a variety of technical architectures and have a large overall scale. At present, they are migrating to OpenStack in the continuous exploration framework. At present, the size of clusters that have applied OpenStack is 5000+ nodes, which will grow rapidly in the future. Currently, the government cloud has provided 60,000+ virtual machines, 400,000+ vcpu, 30P+ storage for users, and hosts 11,000+ online applications. Inspur Cloud is building opscenter tools based on Kubernetes and unified region management. At present, the number of Kubernetes PODs is more than 5,000 (after LCM migration project is launched, it is expected to reach 30,000+). Wave cloud Devops cloud provides CI/CD environment for more than 10,000 developers.

What kind of operational challenges have you overcome during your experience with open infrastructure?

OpenStack’s components depend on MQ. When the scale of cluster is large, MQ is the bottleneck of expansion. We use the mode of Nova cells V2 and dedicated MQ cluster to solve this problem. When the memory of virtual machine is large and the dirty data generated is faster than the transmission speed, the virtual machine migration fails. We use the features of postcopy and coverage to solve this problem.

Hardware heterogeneity and expansion flexibility are the pain points of cloud computing network. Inspur cloud network effectively solves the problem of flexible expansion of virtual network functions through self-research EIP cluster and secondary development of Neutron, and shields the heterogeneity of underlying devices through self-research cluster.

How is this team innovating with open infrastructure?

In order to solve the reliability of key components such as MQ and DB that OpenStack depends on, we developed a self-developed system to implement MQ/DB fault monitoring and automated recovery function.We support innovations such as virtual machine CPU and memory heat upgrade, virtual machine priority local resize, and encryption dog by modifying OpenStack’s code.

On the basis of open source OVS and OpenStack Neutron, its virtual network architecture adds key functions missing in open source systems such as network ACL, VPC peer-to-peer connection, hybrid VxLAN interconnection, EIP cluster and so on.
Inspur InCloud OpenStack 5.6 has completed a test with a single cluster size of up to 500 nodes, which is currently the largest single-cluster test based on OpenStack Rocky in the world.

Each community member can rate the nominees once by October 8 at 11:59 p.m. Pacific Daylight Time.

The post Shanghai Superuser Award Nominee: InCloud OpenStack Team appeared first on Superuser.

by Superuser at October 01, 2019 08:12 PM

Shanghai Superuser Award Nominee – Rakuten Mobile Network Organization

It’s time for the community to help determine the winner of the Open Infrastructure Summit Shanghai Superuser Awards. The Superuser Editorial Advisory Board will review the nominees and determine the finalists and overall winner after the community has had a chance to review and rate nominees.

Now, it’s your turn.

The Rakuten Mobile Network Organization team is one of five nominees for the Superuser Awards. Review the nomination criteria below, check out the other nominees and rate the nominees before the deadline October 8 at 11:59 p.m. Pacific Daylight Time.

Rate them here!

Who is the nominee?

Rakuten Mobile Network Organization, whose team consists of 100+ members.

Core leaders of the team:

Tareq Amin, Ashiq Khan, Ryota Mibu, Yusuke Takano, Masaaki Kosugi, Yuichi Koike, Yuka Takeshita, Rahul Atri, Shinya Kita, Vineet Singh, Mohamed Aslam, Jun Okada, Sharad Sriwastawa, Sushil Rawat, Michael Treasure.

How has open infrastructure transformed the organization’s business?

In June of 2018, Rakuten Inc., Japan, launched a new initiative to enter into the highly competitive mobile market space in Japan as the 4th Mobile Network Operator (MNO), so that they can own the entire customer experience over their network. Rakuten has decided to push the cloud technology boundary to its limits and, in this regard, has gone with a cloud-based architecture based on OpenStack and Kubernetes for its mobile network. In its goal to get a fully automated, highly efficient, cost optimized solution, Rakuten has chosen to run their entire cloud infrastructure on commercial, off-the-shelf (COTS) x86 servers, powered by Cisco Virtualized Infrastructure Manager (CVIM), an OpenStack-based NFV platform.

Open source technology has made this a reality in an extremely short timeframe.

How has the organization participated in or contributed to an open source project?

Rakuten is an active user of the OpenStack technology. In this regard, they have pushed Cisco and Red Hat to back port features like trusted_vf, and Cinder multi-attach feature for RBD backend to Queens. Also, since the entire network is IPv6, they are the key proponents to get IPv6 going in Kubernetes.

What open source technologies does the organization use in its open infrastructure environment?

Rakuten uses CVIM, Cisco’s OpenStack infrastructure manager designed for use in highly distributed telco network environments. Rakuten is also using Kubernetes for their container workload, which is hosted on CVIM as well. Cisco VIM is composed of many open source components along with OpenStack, such as Prometheus, Telegraf, Grafana (for monitoring), Elasticsearch, fluentd, and Kibana (for logging), and a variety of deployment and automation tools. The OPNFV toolsets, VMTP and NFVBench, are integrated with CVIM’s OpenStack deployment to prove out networking functionality and performance requirements, key to delivering telco-grade SLAs. Ceph is used to provide fault-tolerant storage.

What is the scale of your open infrastructure environment?

Rakuten mobile network, in its full scale, will consist of several thousand clouds, each of which will run VNFs and CNFs that are critical to the mobile world. All these clouds are currently based on the Queens release of OpenStack and are orchestrated by CVIM (OpenStack Queens). Some of the clouds also run the VNFMs, OSS/BSS systems and private cloud for customer multimedia data storage and sharing.

The overall plan is to deploy several thousand clouds running vRAN workloads spread across all of Japan to a target 5M mobile phone users. These are small edge clouds that run mobile radio specific workloads and need distributing across the countryside to be close to the antennae they control.

The deployment includes 135K cores, with a target of using up to a million cores when done.

What kind of operational challenges have you overcome during your experience with open infrastructure?

The main challenge associated with his network is the sheer number of clouds associated with the solution. Planning and operationalizing hardware and software updates/upgrades, rolling out new features, BIOS updates, security compliance, etc. and monitoring all of the clouds centrally with full automation is not only a challenge, but is pushing Rakuten and all its vendors towards the edge of technology. Also, the cloud is solely running over IPv6 which is a paradigm shift in the industry. In order to meet such immense challenges, Rakuten Mobile has developed an operation system (OSS) which performs IP address generation and allocation, VNF instantiation and their lifecycle management, mobile basestation commissioning in a fully automated way, to name a few.

How is this team innovating with open infrastructure?

Given the size of the solution, automation is the only way out. Rakuten has invested heavily in automating every operation possible. This includes cloud installation, updates, and reconfiguration over the Rest API provided by CVIM. Also, the cloud has been adapated to handle low latency workloads. Also, Rakuten has created a staging lab, where all vendors bring in their software. Integration testing happens there and, once that passes, the software for every vendor is rolled out. Also, a CI/CD system has been developed, that picks up software from each vendor and rolls it into the test lab for testing to commence.

Rakuten will also be one of the pioneers in offering mobile gaming and low latency applications from its edge data centers using true multi-access edge computing (MEC).

 

Each community member can rate the nominees once by October 8 at 11:59 p.m. Pacific Daylight Time.

The post Shanghai Superuser Award Nominee – Rakuten Mobile Network Organization appeared first on Superuser.

by Superuser at October 01, 2019 08:12 PM

Aptira

Real World Open Networking. Part 1 – Unpacking Openness

Aptira Real-world Open Networking – Part 1 – Unpacking Openness

In our last post in this series, we completed our coverage of the third domain of Open Networking, that of Open Network Integration. To wrap up our overall Open Networking series, we begin a set of posts that address the practical reality of delivering Open Network solutions in the current marketplace, and particularly focusing on Interoperability.

Our first topic describes the attributes of an Open Network Solution.

What is an Open Network Solution?

Aptira describes Open Networking as the alignment of technology capabilities, into a holistic practice that designs, builds and operates a solution. Successfully implementing and operating a network needs a lot more than just technology and practices. 

A mandatory aspect of Open Network solutions, of course, is that they are actually “open”. But in the industry this term can be somewhat vague and has many meanings depending on who you talk to. 

As part of the description and definition of Open Networks in this series of posts, Aptira puts forward our definition of “openness”, and describes how this definition can be practically applied to the solutions that we build. 

Openness – Key attributes of an Open Network Solution

Rather than completely “re-invent the wheel”, we’ve adapted and enhanced one of many existing models of Open Networking, that of Aruba Networks. Aptira’s extended model of the key attributes of an Open Network solution consists of five attributes that are essential for solution constructors to leverage existing components in a solution:

  • Open Standards – standards (both “de facto” and “de jure”) are prominent in the Open Networking domain. These include the protocols that devices use to interact within a network or externally across other networks;
  • Open APIs – the ability to “program your infrastructure as code” based on a set of documented, supported and robust API’s that allow solution constructors to use “composable infrastructure” – to some extent, Open API’s also fall under the standards capability, but we will discuss this in more detail;
  • Open (Partner) Ecosystem – enables rapid and frictionless onboarding of new infrastructure into solutions. Aptira strongly believes that an Open Partner Ecosystem must include the principle of “Open Skills” – what we call “teaching our customers to fish” – to ensure that customers do not remain dependent on vendor resources but build skills internally or at least can select alternative sources for required skillsets;
  • Open Source – the power and value of open source component availability is key, and we’ve examined this link.
  • Open Operations –the ability of an operator rapidly to develop, deploy and operate complex integrated solutions sustainably over time, with high integrity and high flexibility. Since Aptira has added this to the original model, we’ll explain this in more detail below.

In the current marketplace, each of these attributes has practical implementation challenges, but they are all required if a network solution is to be considered truly open.

Open Operations

Aptira added “Open Operations” to the original model, as well as making some other adaptations.

Open Operations is as much related to the operator’s organisational capabilities and skillsets as it is to the technology and partner considerations in the other four pillars, but it is a critical success factor for an operator successfully to capture the promised value of the Open Networking proposition. It extends to the entire solution lifecycle, including the development phase, and is critical in reducing or eliminating the gaps and organisational transitions that occur in the rollout of capability from the development stage of the lifecycle into the production operations stages.

Open Operations is partly, and most importantly, fulfilled by DevOps, but it is more than that. Open Operations is tightly intertwined with the Open Partner Ecosystem to enable third-party participation in operational processes, as required to meet business objectives.

Why do we need Openness?

The holy grail of “openness” is Interoperability, or the ability to interchange components at will.

Whilst this term originally applied to just technical components (network equipment and software), in Open Network solutions it has a broader scope, covering technology, skills and processes amongst other things. Essentially, interoperability means no lock-in, either to a particular vendor or a particular technology choice or product line.

Given that Open Networking solutions are “multi-everything”: multi-vendor, multi-technology, multi-location, multi-product etc, this objective is very important.

We will expand on Interoperability in the next post.

Stay tuned.

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Real World Open Networking. Part 1 – Unpacking Openness appeared first on Aptira.

by Adam Russell at October 01, 2019 02:58 AM

September 30, 2019

OpenStack Superuser

Review of Pod-to-Pod Communications in Kubernetes

Nowadays, Kubernetes has changed the way software development is done. As a portable, extensible, open-source platform for managing containerized workloads and services that facilitates both declarative configuration and automation, Kubernetes has proven itself to be a dominant player for managing complex microservices. Its popularity stems from the fact that Kubernetes meets the following needs: businesses want to grow and pay less, DevOps want a stable platform that can run applications at scale, developers want reliable and reproducible flows to write, test and debug code. Here is a good article to learn more about Kubernetes evolution and architecture.

One of the important areas of managing Kubernetes network is to forward container ports internally and externally to make sure containers and Pods can communicate with one another properly. To manage such communications, Kubernetes offers the following four networking models:

  • Container-to-Container communications
  • Pod-to-Pod communications
  • Pod-to-Service communications
  • External-to-internal communications

In this article, we dive into Pod-to-Pod communications by showing you ways in which Pods within a Kubernetes network can communicate with one another.

While Kubernetes is opinionated in how containers are deployed and operated, it is very non-prescriptive of how the network should be designed in which Pods are to be run. Kubernetes imposes the following fundamental requirements on any networking implementation (barring any intentional network segmentation policies):

  • All pods can communicate with all other pods without NAT
  • All nodes running pods can communicate with all pods (and vice-versa) without NAT
  • IP that a pod sees itself as is the same IP that other pods see it as

For the illustration of these requirements let us use a cluster with two cluster nodes. Nodes are in subnet 192.168.1.0/24 and Pods use 10.1.0.0/16 subnet, with 10.1.1.0/24 and 10.1.2.0/24 used by node1 and node2 respectively for the Pod IP’s.

So from above, Kubernetes requirements following communication paths must be established by the network.

  • Nodes should be able to talk to all pods. For e.g. 192.168.1.100 should be able to reach 10.1.1.2, 10.1.1.3, 10.1.2.2 and 10.1.2.3 directly (without NAT)
  • A Pod should be able to communicate with all nodes. For e.g. Pod 10.1.1.2 should be able to reach 192.168.1.100 and 192.168.1.101 without NAT
  • A Pod should be able to communicate with all Pods. For e.g 10.1.1.2 should be able to communicate with 10.1.1.3, 10.1.2.2 and 10.1.2.3 directly (without NAT)

While exploring these requirements, we will lay the foundation for how the services are discovered and exposed. There can be multiple ways to design the network that meets Kubernetes networking requirements with varying degrees of complexity and flexibility.

Pod-to-Pod Networking and Connectivity

Kubernetes does not orchestrate setting up the network and offloads the job to the CNI plug-ins. Here is more info for the CNI plugin installation. Below are possible network implementation options through CNI plugins which permits Pod-to-Pod communication honoring the Kubernetes requirements:

  1. Layer 2 (switching) solution
  2. Layer 3 (routing) solution
  3. Overlay solutions


I- Layer 2 Solution

This is the simplest approach and should work well for small deployments. Pods and nodes should see subnet used for Pod’s IP as a single l2 domain. Pod-to-Pod communication (on same or across hosts) happens through ARP and L2 switching. We could use bridge CNI plug-in to reuse a L2 bridge for pod containers with the below configuration on node1 (note /16 subnet).

 

{

“name”: “mynet”,

“type”: “bridge”,

“bridge”: “kube-bridge”,

“isDefaultGateway”: true,

“ipam”: {

“type”: “host-local”,

“subnet”: “10.1.0.0/16”

}

}

 

kube-bridge needs to be pre-created such that ARP packets go out on the physical interface. In order for that we have another bridge with physical interface connected to it and node IP assigned to it to which kube-bridge is hooked through the veth pair like below.

We can pass a bridge which is pre-created, in which case bridge CNI plugin will reuse the bridge.

II- Layer 3 Solutions

A more scalable approach is to use node routing rather than switching the traffic to the Pods. We could use bridge CNI plug-in to create a bridge for Pod containers with gateway configured. For e.g. on node1 below the configuration can be used (note /24 subnet).

{

“name”: “mynet”,

“type”: “bridge”,

“bridge”: “kube-bridge”,

“isDefaultGateway”: true,

“ipam”: {

“type”: “host-local”,

“subnet”: “10.1.1.0/24”

}

}

So how does Pod1 with IP 10.1.1.2 running on node1 communicate with Pod3 with IP 10.1.2.2 running on node2? We need a way for nodes to route the traffic to other node Pod subnets.

We could populate the default gateway router with routes for the subnet as shown in the below diagram. Routes to 10.1.1.0/24 and 10.1.2.0/24 are configured to be through node1 and node2 respectively. We could automate keeping the route tables updated as nodes are added or deleted in to the cluster. We can also use some of the container networking solutions which can do the job on public clouds, for e.g. Flannel’s backend for AWS and GCE, Weave’s AWS-VPC mode, etc.

Alternatively, each node can be populated with routes to the other subnets as shown in the below diagram. Again, updating the routes can be automated in small/static environment as nodes are added/deleted in the cluster or container networking solutions like calico, or Flannel host-gateway backend can be used.

III- Overlay Solutions

Unless there is a specific reason to use an overlay solution, it generally does not make sense considering the networking model of Kubernetes and it lacks of support for multiple networks. Kubernetes requires that nodes should be able to reach each Pod, even though Pods are in an overlay network. Similarly, Pods should be able to reach any node as well. We will need host routes in the nodes set such that Pods and nodes can talk to each other. Since inter host Pod-to-Pod traffic should not be visible in the underlay, we need a virtual/logical network that is overlaid on the underlay. Pod-to-Pod traffic would need to be encapsulated at the source node. The encapsulated packet is then forwarded to the destination node where it is de-encapsulated. A solution can be built around any existing Linux encapsulation mechanism. We need to have a tunnel interface (with VXLAN, GRE, etc. encapsulation) and a host route such that inter node Pod-to-Pod traffic is routed through the tunnel interface. Below is a very generalized view of how an overlay solution can be built that can meet Kubernetes network requirements. Unlike two previous solutions there is significant effort in the overlay approach with setting up tunnels, populating FDB, etc. Existing container networking solutions like Weave and Flannel can be used to setup a Kubernetes deployment with overlay networks. Here is a good article for reading more on similar Kubernetes topics.

 

Conclusion

In this article, we covered how the cross-node Pod-to-Pod networking works, how services are exposed with-in the cluster to the Pods, and externally. What makes Kubernetes networking interesting is how the design of core concepts like services, network policy, etc. permits several possible implementations. Though some core components and add-ons provide default implementations, they are replaceable. There is a whole ecosystem of network solutions that plug neatly into the Kubernetes networking semantics. Now that you learn how Pods inside a Kubernetes system can communicate and exchange data, you can move on to learn other Kubernetes networking models such as Container-to-Container or Pod-to-Service communications. Here is a good article for learning more advanced topics on Kubernetes development.

 

About the Author

This article is written by Matt Zand who is the founder of High School Technology Services, DC Web Makers and Coding Bootcamps.  He has written extensively on advance topics on web design, mobile App development and blockchain. He is a senior editor at Touchstone Words where he writes and reviews coding and technology articles. He is also senior instructor and developer living in Washington DC. You can follow him on Linkedin.

Photo // CC BY NC

The post Review of Pod-to-Pod Communications in Kubernetes appeared first on Superuser.

by Matt Zand at September 30, 2019 02:00 PM

Galera Cluster by Codership

EverData reports Galera Cluster outshines Amazon Aurora and RDS

EverData a leading data center and cloud solution provider in India has recently been writing quite a bit about Galera Cluster and it seems like we should highlight them. For one, they’ve talked about streaming replication in Galera Cluster 4, available in MariaDB Server 10.4. However, let us focus on their post: Galera Cluster vs Amazon RDS: A Comprehensive Review.

They compared MySQL with MHA, MySQL with Galera Cluster and also Amazon Web Services (AWS) Relational Database Service (RDS). Their evaluation criteria was to see how quickly there would be recovery after a crash, as well as performance while managing concurrent reads and writes.

In their tests, they found that time for failover with Galera Cluster between 8-10 seconds, whereas Aurora (so they were not using Amazon RDS?) took between 15-51 seconds, and MySQL with MHA took 140 seconds (which seems excessively high, considering this solution has been known to do sub-10 second failovers — so maybe needing some configuration tuning?)

In terms of performance, MySQL with MHA comes out the winner over Galera Cluster, but Galera Cluster comes out ahead of RDS (is this RDS MySQL or Aurora?). This can simply be explained that MySQL with MHA was likely configured with asynchronous replication and not semi-synchronous, and you’re only writing to one node as opposed to three nodes. (MHA setups recommend the use of semi-synchronous replication; also later on in the report, there is a note about replication lag).

Some takeaways in conclusion, straight from their report:

  • “If HA and low failover time are the major factors, then MySQL with Galera is the right choice.”
  • High Availability: “MySQL/Galera was more efficient and consistent, but the RDS didn’t justify the episodes of replication lags.”
  • Performance: “MySQL/Galera outperformed the RDS in all tests — by the execution time, number of transactions, and rows managed.”

“we can see that MySQL/Galera manages the commit phase part more efficiently along with the replication and data validation.” Source: EverData

Looks extremely positive from a Galera Cluster standpoint, and we thank the team at EverData for such a report. It is interesting to read so take look Galera Cluster vs Amazon RDS: A Comprehensive Review.  

 

 

by Sakari Keskitalo at September 30, 2019 11:51 AM

September 28, 2019

Christopher Smart

Using pipefail with shell module in Ansible

If you’re using the shell module with Ansible and piping the output to another command, it might be a good idea to set pipefail. This way, if the first command fails, the whole task will fail.

For example, let’s say we’re running this silly task to look for /tmp directory and then trim the string “tmp” from the result.

ansible all -i "localhost," -m shell -a \
'ls -ld /tmp | tr -d tmp'

This will return something like this, with a successful return code.

localhost | CHANGED | rc=0 >>
drwxrwxrw. 26 roo roo 640 Se 28 19:08 /

But, let’s say the directory doesn’t exist, what would the result be?

ansible all -i "localhost," -m shell -a \
'ls -ld /tmpnothere | tr -d tmp'

Still success because of the pipe to trim was successful, even though we can see the ls command failed.

localhost | CHANGED | rc=0 >>
ls: cannot access ‘/tmpnothere’: No such file or directory

This time, let’s set pipefail first.

ansible all -i "localhost," -m shell -a \
'set -o pipefail && ls -ld /tmpnothere | tr -d tmp'

This time it fails, as expected.

localhost | FAILED | rc=2 >>
ls: cannot access ‘/tmpnothere’: No such file or directorynon-zero return code

If /bin/sh on the remote node does not point to bash then you’ll need to pass in an argument specifying bash as the executable to use for the shell task.

  - name: Silly task
   shell: set -o pipefail && ls -ld /tmp | tr -d tmp
   args:
     executable: /usr/bin/bash

Ansible lint will pick these things up for you, so why not run it across your code 😉

by Chris at September 28, 2019 10:43 AM

September 27, 2019

Chris Dent

Placement Update 19-∞

Let's call this placement update 19-∞, as this will be my last one. It's been my pleasure to provide this service for nearly three years. I hope it has been as useful to others as it has been for me. The goal all along was to provide some stigmergic structures to augment how we, the placement team, collaborated.

I guess it worked: yesterday we released a candidate that will likely become the Train version of placement. The first version where the only placement you can get is from its own repo/project. Thanks to everyone who has made this possible over the years.

Most Important

Tetsuro will be the next placement PTL. He and Gibi are working on the project update and other Summit/PTG-related activities for Shanghai. If you have thoughts on that, please contact them.

The now worklist I made a couple weeks ago has had some progress, but some tasks remain. None of them were critical for the release, except perhaps the ongoing need for better documentation of how to most effectively use the service.

Since we're expecting the Ussuri release to be one where we consolidate how other services make use of placement, most of the items on that list can fit well with that.

We should be on the lookout for bugs reported by people trying to the release candidate(s).

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 21 (-2) stories in the placement group. 0 (0) are untagged. 7 (2) are bugs. 2 (-2) are cleanups. 9 (-1) are rfes. 4 (-1) are docs.

If you're interested in helping out with placement, those stories are good places to look.

osc-placement

There are several osc-placement changes, many of which are related to the cutting of a stable/train branch.

Main Themes

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting.

Cleanup

Cleanup is an overarching theme related to improving documentation, performance and the maintainability of the code. The changes we made this cycle are fairly complex to use and were fairly complex to write. We need to make sure, with the coming cycle, that we help people use them well.

Other Placement

Miscellaneous changes can be found in the usual place.

There are two os-traits changes being discussed. And zero os-resource-classes changes.

Other Service Users

Since we're in RC period I'll not bother listing pending changes. During Ussuri the placement team hopes to be available to facilitate other projects using the service. Keeping track of those is what this section has been trying to do. Besides general awareness of what's going on, I've also used this gerrit query to find things that might be related to placement.

End

🙇

by Chris Dent at September 27, 2019 01:32 PM

September 23, 2019

Mirantis

PaaS vs KaaS: What’s the difference, and when does it matter?

Earlier this month I had the pleasure of addressing the issue of Platform as a Service vs Kubernetes as a Service. We talked about the differences between the two modes, and their relative strengths and weaknesses.

by Nick Chase at September 23, 2019 01:49 PM

StackHPC Team Blog

Bespoke Bare Metal: Ironic Deploy Templates

Ironic's mascot, Pixie Boots

Iron is solid and inflexible, right?

OpenStack Ironic's Deploy Templates feature brings us closer to a world where bare metal servers can be automatically configured for their workload.

In this article we discuss the Bespoke Bare Metal (slides) presentation given at the Open Infrastructure summit in Denver in April 2019.

BIOS & RAID

The most requested features driving the deploy templates work are dynamic BIOS and RAID configuration. Let's consider the state of things prior to deploy templates.

Ironic has for a long time supported a feature called cleaning. This is typically used to perform actions to sanitise hardware, but can also perform some one-off configuration tasks. There are two modes - automatic and manual. Automatic cleaning happens when a node is deprovisioned. A typical use case for automatic cleaning is shredding disks to remove sensitive data. Manual cleaning happens on demand, when a node is not in use. The following diagram shows a simplified view of the node states related to cleaning.

Ironic cleaning states (simplified)

Cleaning works by executing a list of clean steps, which map to methods exposed by the Ironic driver in use. Each clean step has the following fields:

  • interface: One of deploy, power, management, bios, raid
  • step: Method (function) name on the driver interface
  • args: Dictionary of keyword arguments
  • priority: Order of execution (higher runs earlier)

BIOS

BIOS configuration support was added in the Rocky cycle. The bios driver interface provides two clean steps:

  • apply_configuration: apply BIOS configuration
  • factory_reset: reset BIOS configuration to factory defaults

Here is an example of a clean step that uses the BIOS driver interface to disable HyperThreading:

{
  "interface": "bios",
  "step": "apply_configuration",
  "args": {
    "settings": [
      {
        "name": "LogicalProc",
        "value": "Disabled"
      }
    ]
  }
}

RAID

Support for RAID configuration was added in the Mitaka cycle. The raid driver interface provides two clean steps:

  • create_configuration: create RAID configuration
  • delete_configuration: delete all RAID virtual disks

The target RAID configuration must be set in a separate API call prior to cleaning.

{
  "interface": "raid",
  "step": "create_configuration",
  "args": {
    "create_root_volume": true,
    "create_nonroot_volumes": true
  }
}

Of course, support for BIOS and RAID configuration is hardware-dependent.

Limitations

While BIOS and RAID configuration triggered through cleaning can be useful, it has a number of limitations. The configuration is not integrated into Ironic node deployment, so users cannot select a configuration on demand. Cleaning is not available to Nova users, so it is accessible only to administrators. Finally, the requirement for a separate API call to set the target RAID configuration is quite clunky, and prevents the configuration of RAID in automated cleaning.

With these limitations in mind, let's consider the goals for bespoke bare metal.

Goals

We want to allow a pool of hardware to be applied to various tasks, with an optimal server configuration used for each task. Some examples:

  • A Hadoop node with Just a Bunch of Disks (JBOD)
  • A database server with mirrored & striped disks (RAID 10)
  • A High Performance Computing (HPC) compute node, with tuned BIOS parameters

In order to avoid partitioning our hardware, we want to be able to dynamically configure these things when a bare metal instance is deployed.

We also want to make it cloudy. It should not require administrator privileges, and should be abstracted from hardware specifics. The operator should be able to control what can be configured and who can configure it. We'd also like to use existing interfaces and concepts where possible.

Recap: Scheduling in Nova

Understanding the mechanics of deploy templates requires a reasonable knowledge of how scheduling works in Nova with Ironic. The Placement service was added to Nova in the Newton cycle, and extracted into a separate project in Stein. It provides an API for tracking resource inventory & consumption, with support for both quantitative and qualitative aspects.

Let's start by introducing the key concepts in Placement.

  • A Resource Provider provides an Inventory of resources of different Resource Classes
  • A Resource Provider may be tagged with one or more Traits
  • A Consumer may have an Allocation that consumes some of a Resource Provider’s Inventory

Scheduling Virtual Machines

In the case of Virtual Machines, these concepts map as follows:

  • A Compute Node provides an Inventory of vCPU, Disk & Memory resources
  • A Compute Node may be tagged with one or more Traits
  • An Instance may have an Allocation that consumes some of a Compute Node’s Inventory

A hypervisor with 35GB disk, 5825MB RAM and 4 CPUs might have a resource provider inventory record in Placement accessed via GET /resource_providers/{uuid}/inventories that looks like this:

{
    "inventories": {
        "DISK_GB": {
            "allocation_ratio": 1.0, "max_unit": 35, "min_unit": 1,
            "reserved": 0, "step_size": 1, "total": 35
        },
        "MEMORY_MB": {
            "allocation_ratio": 1.5, "max_unit": 5825, "min_unit": 1,
            "reserved": 512, "step_size": 1, "total": 5825
        },
        "VCPU": {
            "allocation_ratio": 16.0, "max_unit": 4, "min_unit": 1,
            "reserved": 0, "step_size": 1, "total": 4
        }
    },
    "resource_provider_generation": 7
}

Note that the inventory tracks all of a hypervisor's resources, whether they are consumed or not. Allocations track what has been consumed by instances.

Scheduling Bare Metal

The scheduling described above for VMs does not apply cleanly to bare metal. Bare metal nodes are indivisible units, and cannot be shared by multiple instances or overcommitted. They're either in use or not. To resolve this issue, we use Placement slightly differently with Nova and Ironic.

  • A Bare Metal Node provides an Inventory of one unit of a custom resource
  • A Bare Metal Node may be tagged with one or more Traits
  • An Instance may have an Allocation that consumes all of a Bare Metal Node’s Inventory

If we now look at the resource provider inventory record for a bare metal node, it might look like this:

{
    "inventories": {
        "CUSTOM_GOLD": {
            "allocation_ratio": 1.0,
            "max_unit": 1,
            "min_unit": 1,
            "reserved": 0,
            "step_size": 1,
            "total": 1
        }
    },
    "resource_provider_generation": 1
}

We have just one unit of one resource class, in this case CUSTOM_GOLD. The resource class comes from the resource_class field of the node in Ironic, upper-cased, and with a prefix of CUSTOM_ to denote that it is a custom resource class as opposed to a standard one like VCPU.

What sort of Nova flavor would be required to schedule to this node?

openstack flavor show bare-metal-gold -f json \
    -c name -c ram -c properties -c vcpus -c disk
{
  "name": "bare-metal-gold",
  "vcpus": 4,
  "ram": 4096,
  "disk": 1024,
  "properties": "resources:CUSTOM_GOLD='1',
                 resources:DISK_GB='0',
                 resources:MEMORY_MB='0',
                 resources:VCPU='0'"
}

Note that the standard fields (vcpus etc.) may be specified for informational purposes, but should be zeroed out using properties as shown.

Traits

So far we have covered scheduling based on quantitative resources. Placement uses traits to model qualitative resources. These are associated with resource providers. For example, we might query GET /resource_providers/{uuid}/traits for a resource provider that has an FPGA to find some information about the class of the FPGA device.

{
    "resource_provider_generation": 1,
    "traits": [
        "CUSTOM_HW_FPGA_CLASS1",
        "CUSTOM_HW_FPGA_CLASS3"
    ]
}

Ironic nodes can have traits assigned to them, in addition to their resource class: GET /nodes/{uuid}?fields=name,resource_class,traits:

{
  "Name": "gold-node-1",
  "Resource Class": "GOLD",
  "Traits": [
    "CUSTOM_RAID0",
    "CUSTOM_RAID1",
  ]
}

Similarly to quantitative scheduling, traits may be specified via a flavor when creating an instance.

openstack flavor show bare-metal-gold -f json -c name -c properties
{
  "name": "bare-metal-gold",
  "properties": "resources:CUSTOM_GOLD='1',
                 resources:DISK_GB='0',
                 resources:MEMORY_MB='0',
                 resources:VCPU='0',
                 trait:CUSTOM_RAID0='required'"
}

This flavor will select bare metal nodes with a resource_class of CUSTOM_GOLD, and a list of traits including CUSTOM_RAID0.

To allow ironic to take action based upon the requested traits, the list of required traits are stored in the Ironic node object under the instance_info field.

Ironic deploy steps

The Ironic deploy steps framework was added in the Rocky cycle as a first step towards making the deployment process more flexible. It is based on the clean step model described earlier, and allows drivers to define steps available to be executed during deployment. Here is the simplified state diagram we saw earlier, this time highlighting the deploying state in which deploy steps are executed.

Ironic deployment states (simplified)

Each deploy step has:

  • interface: One of deploy, power, management, bios, raid
  • step: Method (function) name on the driver interface
  • args: Dictionary of keyword arguments
  • priority: Order of execution (higher runs earlier)

Notice that this is the same as for clean steps.

The mega step

In the Rocky cycle, the majority of the deployment process was moved to a single step called deploy on the deploy interface with a priority of 100. This step roughly does the following:

  • power on the node to boot up the agent
  • wait for the agent to boot
  • write the image to disk
  • power off
  • unplug from provisioning networks
  • plug tenant networks
  • set boot mode
  • power on

Drivers can currently add steps before or after this step. The plan is to split this into multiple core steps for more granular control over the deployment process.

Limitations

Deploy steps are static for a given set of driver interfaces, and are currently all out of band - it is not possible to execute steps on the deployment agent. Finally, the mega step limits ordering of the steps.

Ironic deploy templates

The Ironic deploy templates API was added in the Stein cycle and allows deployment templates to be registered which have:

  • a name, which must be a valid trait
  • a list of deployment steps

For example, a deploy template could be registered via POST /v1/deploy_templates:

{
    "name": "CUSTOM_HYPERTHREADING_ON",
    "steps": [
        {
            "interface": "bios",
            "step": "apply_configuration",
            "args": {
                "settings": [
                    {
                        "name": "LogicalProc",
                        "value": "Enabled"
                    }
                ]
            },
            "priority": 150
        }
    ]
}

This template has a name of CUSTOM_HYPERTHREADING_ON (which is also a valid trait name), and references a deploy step on the bios interface that sets the LogicalProc BIOS setting to Enabled in order to enable Hyperthreading on a node.

Tomorrow’s RAID

In the Stein release we have the deploy templates and steps frameworks, but lack drivers with deploy step implementations to make this useful. As part of the demo for the Bespoke Bare Metal talk, we built and demoed a proof of concept deploy step for configuring RAID during deployment on Dell machines. This code has been polished and is working its way upstream at the time of writing, and has also influenced deploy steps for the HP iLO driver. Thanks to Shivanand Tendulker for extracting and polishing some of the code from the PoC.

We now have an apply_configuration deploy step available on the RAID interface which accepts RAID configuration as an argument, to avoid the separate API call required in cleaning.

The first pass at implementing this in the iDRAC driver took over 30 minutes to complete deployment. This was streamlined to just over 10 minutes by combining deletion and creation of virtual disks into a single deploy step, and avoiding an unnecessary reboot.

End to end flow

Now we know what a deploy template looks like, how are they used?

First of all, the cloud operator creates deploy templates via the Ironic API to execute deploy steps for allowed actions. In this example, we have a deploy template used to create a 42GB RAID1 virtual disk.

cat << EOF > raid1-steps.json
[
    {
        "interface": "raid",
        "step": "apply_configuration",
        "args": {
            "raid_config": {
                "logical_disks": [
                    {
                        "raid_level": "1",
                        "size_gb": 42,
                        "is_root_volume": true
                    }
                ]
            }
        },
        "priority": 150
    }
]
EOF

openstack baremetal deploy template create \
    CUSTOM_RAID1 \
    --steps raid1-steps.json

Next, the operator creates Nova flavors or Glance images with required traits that reference the names of deploy templates.

openstack flavor create raid1 \
    --property resources:VCPU=0 \
    --property resources:MEMORY_MB=0 \
    --property resources:DISK_GB=0 \
    --property resources:CUSTOM_COMPUTE=1 \
    --property trait:CUSTOM_RAID1=required

Finally, a user creates a bare metal instance using one of these flavors that is accessible to them.

openstack server create \
    --name test \
    --flavor raid1 \
    --image centos7 \
    --network mynet \
    --key-name mykey

What happens? A bare metal node is scheduled by Nova which has all of the required traits from the flavor and/or image. Those traits are then used by Ironic to find deploy templates with matching names, and the deploy steps from those templates are executed in addition to the core step, in an order determined by their priorities. In this case, the RAID apply_configuration deploy step runs before the core step because it has a higher priority.

Future Challenges

There is still work to be done to improve the flexibility of bare metal deployment. We need to split out the mega step. We need to support executing steps in the agent running on the node, which would enable deployment-time use of the software RAID support recently developed by Arne Wiebalck from CERN.

Drivers need to expose more deploy steps for BIOS, RAID and other functions. We should agree on how to handle executing a step multiple times, and all the tricky corner cases involved.

We have discussed the Nova use case here, but we could also make use of deploy steps in standalone mode, by passing a list of steps to execute to the Ironic provision API call, similar to manual cleaning. There is also a spec proposed by Madhuri Kumari which would allow reconfiguring active nodes to do things like tweak BIOS settings without requiring redeployment.

Thanks to everyone who has been involved in designing, developing and reviewing the series of features in Nova and Ironic that got us this far. In particular John Garbutt who proposed the specs for deploy steps and deploy templates, and Ruby Loo who implemented the deploy steps framework.

by Mark Goddard at September 23, 2019 11:00 AM

September 19, 2019

OpenStack Superuser

Unleashing the Open Infrastructure Potentials at OpenInfra Days Vietnam 2019

Hosted in Hanoi and organized by the Vietnam OpenInfra User Group (VOI), Vietnam Internet Association (VIA), and VFOSSA, the second Vietnam OpenInfra Days exceeded expectations by selling out in two weeks and influx of sponsoring offers until one week before the event. Broadening the focus on open infrastructure, the event attracted 300 people to the morning sessions and 500 people to the afternoon (open) sessions. Attendees represented more than 90 companies including telcos, cloud, and mobile application providers, who have been applying open source technologies to run their cloud infrastructure and seeking to unleash the potential to increase flexibility, efficiency, and ease of management.

VOID 2019 morning session and exhibition booths

Structured around container technologies, automation, and security, the agenda featured 25  sessions, including case studies, demos, and tutorials. In their talks, the speakers—solution architects, software architects and DevOps engineers—shared their experiences and best practices while building and running customers’ (and their own infrastructure) using OpenStack, Kubernetes, CI/CD, etc.  Many heated discussions were brought to the breaks and the gala dinner showing immense interest in open infrastructure.

“The event, in general, is a playground for open source developers, particularly in open infrastructure. In addition, through the event we would like to bring real case studies which are happening in the world to Vietnam so that companies in Vietnam who have been applying open source can see the general trend of the world, as well as make them more confident in open source-based product orientation,” said Tuan Huu Luong, one of the founders of OpenInfra User Group Vietnam, in an interview with VTC1, a national broadcaster.

Local news coverage of the Vietnam OpenInfra Days 2019

Though officially being an OSF User Group meetup, the Vietnam OpenInfra Day (VOID) is the largest venue on cloud computing and ICT infrastructure in Vietnam. The second event this year also showed the impact of the Vietnam OpenInfra community in the region with sponsors from Korea, Japan, Singapore, Taiwan and half of the speakers from all over the world. Accordingly, the organization team produced a rich program for speakers, sponsors, and attendees.

A warm welcome to speakers and sponsors was organized at the pre-event party in a local brewery, where discussions and opinions on open infrastructure and trends were exchanged. A five star lunch buffet at the event venue, InterContinental Hanoi, provided a pleasant occasion for attendees to meet up and network. Finally, the gala dinner in an authentic Vietnamese restaurant offered a chance to finally close the OpenInfra discussions and introduce the international friends to the Vietnamese food culture, and of course, the noisy drinking culture “Uong Bia Di, 1-2-3 Zooo!” The photo gallery of the event can be found here.

The Vietnam OpenInfra team is impressed and thankful for the attendance in a large number of the OpenInfra Korea User Group despite the unfinished plan to co-organize the event due to short of time. However, a plan for co-organizing Korea OpenInfra Meetup was worked out during the event and Korean attendees were obviously enjoying the event very much.

Korea OpenInfra User Group at the VOID 2019

Last but not least is to mention that the success of the event is owed to the constant support from the OpenStack Foundation (OSF), which was a silver sponsor this year. Especially, the participant of OSF members in organizing  OpenStack Upstream Institute training in Hanoi following the main event. Ildiko Vancsa, Kendall Nelson and volunteer trainers from Vietnam and Korea User Group delivered a surprisingly fun and productive training day to a new generation of contributors from Vietnam.

OpenStack Upstream Institute Training Hanoi

Time to say goodbye to VOID 2019. See you again at the next VOID, until then we will celebrate open infrastructure community’s achievements with a series of events starting with the Korea Meetup in October (TBD)!

VOID 2019 (left) and OpenStack Upstream Institute (right) organizing teams

The post Unleashing the Open Infrastructure Potentials at OpenInfra Days Vietnam 2019 appeared first on Superuser.

by Trinh Nguyen at September 19, 2019 01:00 AM

September 18, 2019

Adam Spiers

Improving trust in the cloud with OpenStack and AMD SEV

This post contains an exciting announcement, but first I need to provide some context!

Ever heard that joke “the cloud is just someone else’s computer”?

Coffee mug saying "There is no cloud. It's just someone else's computer"

Of course it’s a gross over-simplification, but there’s more than a grain of truth in it. And that raises the question: if your applications are running in someone else’s data-centre, how can you trust that they’re not being snooped upon, or worse, invasively tampered with?

Until recently, the answer was “you can’t”. Well, that’s another over-simplification. You could design your workload to be tamperproof; for example even if individual mining nodes in Bitcoin or Ethereum are compromised, the blockchain as a whole will resist the attack just fine. But there’s still the snooping problem.

Hardware to the rescue?

However, there’s some good news on this front. Intel and AMD realised this was a problem, and have both introduced new hardware capabilities to help improve the level to which cloud users can trust the environment in which their workloads are executed, e.g.:

  • AMD SEV (Secure Encrypted Virtualization) which can encrypt the memory of a running VM with a key which is only accessible to the owner of that VM. This is done on-chip so that even if you have physical access to the machine, it makes it a lot harder to snoop in on the running VM1.

    It can also provide the guest owner with an attestation which cryptographically proves that the memory was encrypted correctly and can only be decrypted by the owner.

  • Intel MKTME (Multi-Key Total Memory Encryption) which is a similar approach.

But even with that hardware support, there is the question to what degree anyone can trust public clouds run on proprietary technology. There is a growing awareness that Free (Libre) / Open Source Software tends to be inherently more secure and trustworthy, since its transparency enables unlimited peer review, and its openness allows anyone to contribute improvements.

And these days, OpenStack is pretty much the undisputed king of the Open Source cloud infrastructure world.

An exciting announcement

So I’m delighted to be able to announce a significant step forward in trustworthy cloud computing: as of this week, OpenStack is now able to launch VMs with SEV enabled! (Given the appropriate AMD hardware, of course.)

The new hw:mem_encryption flavor extra spec

The core functionality is all merged and will be in the imminent Train release. You can read the documentation, and you will also find it mentioned in the Nova Release Notes.

While this is “only” an MVP and far from the end of the journey (see below), it’s an important milestone in a strong partnership between my employer SUSE and AMD. We started work on adding SEV support into OpenStack around a year ago:

The original blueprint for integrating AMD SEV into nova

This resulted in one of the most in-depth technical specification documentations I’ve ever had to write, plus many months of intense collaboration on the code and several changes in design along the way.

SEV code reviews. Click to view in Gerrit!

I’d like to thank not only my colleagues at SUSE and AMD for all their work so far, but also many members of the upstream OpenStack community, especially the Nova team. In particular I enjoyed fantastic support from the PTL (Project Technical Lead) Eric Fried, and several developers at Red Hat, which I think speaks volumes to how well the “coopetition” model works in the Open Source world.

The rest of this post gives a quick tour of the implementation via screenshots and brief explanations, and then concludes with what’s planned next.

OpenStack’s Compute service (nova) will automatically detect the presence of the SEV feature on any compute node which is configured to support it. You can optionally configure how many slots are available on the memory controller for encryption keys. One is used for each guest, so this effectively acts as the maximum number of guest VMs which can concurrently use SEV. Here you can see the configuration of this option, and how nova handles the inventory. Note that it also registers an SEV trait on the compute host, so that in the future if the cloud has a mix of hardware offering different guest memory encryption technologies, you’ll be able to choose which one you want for any given guest, if you need to.

Inventorying the SEV feature.

SEV can be enabled by the operator by adding a new hw:mem_encryption “extra spec” which is a property on nova’s flavors. As already shown in the screenshot above, this can be done through Horizon, OpenStack’s web dashboard. However it can also be set per-image via a similarly-named property hw_mem_encryption:

Enabling SEV via image property in Horizon.

and of course this can all be done via the command-line too:

Enabling SEV via CLI. Click for full size.

Notice the presence of a few other image properties which are crucial for SEV to function correctly. (These are explained fully in the documentation.)

Once booted, an SEV VM instance looks and behaves pretty much like any other OpenStack VM:

SEV instances listed in Horizon

However there are some limitations, e.g. it cannot yet be live-migrated or suspended:

Enabling SEV via flavor extra spec or image property

Behind the scenes, nova takes care of quite a few important details in how the VM is configured in libvirt. Firstly it performs sanity checks on the flavor and image properties. Then it adds a crucial new <launchSecurity> element:

Enabling SEV via flavor extra spec or image property

and also enables IOMMU for virtio devices:

Enabling IOMMU for virtio devices

What’s next?

This area of technology is new and rapidly evolving, so there is still plenty of work left to be done, especially on the software side.

Of course we’ll be adding this functionality to SUSE OpenStack Cloud, initially as a technical preview for our customers to try out.

Probably the most important feature needed next on the SEV side is the ability to verify the attestation which cryptographically proves that the memory was encrypted correctly and can only be decrypted by the owner. In addition specification of the work required to add support to OpenStack for Intel’s MKTME already started, so I would expect that to continue.

Footnotes:

1

There are still potential attacks, e.g. snooping unencrypted memory cache or CPU registers. Work by AMD and others is ongoing to address these.

Share

The post Improving trust in the cloud with OpenStack and AMD SEV appeared first on Structured Procrastination.

by Adam at September 18, 2019 11:32 AM

September 17, 2019

StackHPC Team Blog

Migrating a running OpenStack to containerisation with Kolla

Deploying OpenStack infrastructures with containers brings many operational benefits, such as isolation of dependencies and repeatability of deployment, in particular when coupled with a CI/CD approach. The Kolla project provides tooling that helps deploy and operate containerised OpenStack deployments. Configuring a new OpenStack cloud with Kolla containers is well documented and can benefit from the sane defaults provided by the highly opinionated Kolla Ansible subproject. However, migrating existing OpenStack deployments to Kolla containers can require a more ad hoc approach, particularly to minimise impact on end users.

We recently helped an organization migrate an existing OpenStack Queens production deployment to a containerised solution using Kolla and Kayobe, a subproject designed to simplify the provisioning and configuration of bare-metal nodes. This blog post describes the migration strategy we adopted in order to reduce impact on end users and shares what we learned in the process.

Existing OpenStack deployment

The existing cloud was running the OpenStack Queens release deployed using CentOS RPM packages. This cloud was managed by a control plane of 16 nodes, with each service deployed over two (for OpenStack services) or three (for Galera and RabbitMQ) servers for high availability. Around 40 hypervisor nodes from different generations of hardware were available, resulting in a heterogeneous mix of CPU models, amount of RAM, and even network interface names (with some nodes using onboard Ethernet interfaces and others using PCI cards).

A separate Ceph cluster was used as a backend for all OpenStack services requiring large amounts of storage: Glance, Cinder, Gnocchi, and also disks of Nova instances (i.e. none of the user data was stored on hypervisors).

A new infrastructure

With a purchase of new control plane hardware also being planned, we advised the following configuration, based on our experience and recommendations from Kolla Ansible:

  • three controller nodes hosting control services like APIs and databases, using an odd number for quorum
  • two network nodes hosting Neutron agents along with HAProxy / Keepalived
  • three monitoring nodes providing centralized logging, metrics collection and alerting, a feature which was critically lacking from the existing deployment

Our goal was to migrate the entire OpenStack deployment to use Kolla containers and be managed by Kolla Ansible and Kayobe, with control services running on the new control plane hardware and hypervisors reprovisioned and reconfigured, with little impact on users and their workflows.

Migration strategy

Using a small-scale candidate environment, we developed our migration strategy. The administrators of the infrastructure would install CentOS 7 on the new control plane, using their existing provisioning system, Foreman. We would configure the host OS of the new nodes with Kayobe to make them ready to deploy Kolla containers: configure multiple VLAN interfaces and networks, create LVM volumes, install Docker, etc.

We would then deploy OpenStack services on this control plane. To reduce the risk of the migration, our strategy was to progressively reconfigure the load balancers to point to the new controllers for each OpenStack service while validating that they were not causing errors. If any issue arose, we would be able to quickly revert to the API services running on the original control plane. Fresh Galera, Memcached, and RabbitMQ clusters would also be set up on the new controllers, although the existing ones would remain in use by the OpenStack services for now. We would then gradually shut down the original services after making sure that all resources are managed by the new OpenStack services.

Then, during a scheduled downtime, we would copy the content of the SQL database, reconfigure all services (on the control plane and also on hypervisors) to use the new Galera, Memcached, and RabbitMQ clusters, and move the virtual IP of the load balancer over to the new network nodes, where HAProxy and Keepalived would be deployed.

The animation below depicts the process of migrating from the original to the new control plane, with only a subset of the services displayed for clarity.

Migration from the original to the new control plane

Finally, we would use live migration to free up several hypervisors, redeploy OpenStack services on them after reprovisioning, and live migrate virtual machines back on them. The animation below shows the transition of hypervisors to Kolla:

Migration of hypervisors to Kolla

Tips & Tricks

Having described the overall migration strategy, we will now cover tasks that required special care and provide tips for operators who would like to follow the same approach.

Translating the configuration

In order to make the migration seamless, we wanted to keep the configuration of services deployed on the new control plane as close as possible to the original configuration. In some cases, this meant moving away from Kolla Ansible's sane defaults and making use of its extensive customisation capabilities. In this section, we describe how to integrate an existing configuration into Kolla Ansible.

The original configuration management tool kept entire OpenStack configuration files under source control, with unique values templated using Jinja. The existing deployment had been upgraded several times, and configuration files had not been updated with deprecation and removal of some configuration options. In comparison, Kolla Ansible uses a layered approach where configuration generated by Kolla Ansible itself is merged with additions or overrides specified by the operator either globally, per role (nova), per service (nova-api), or per host (hypervisor042). This has the advantage of reducing the amount of configuration to check at each upgrade, since Kolla Ansible will track deprecation and removals of the options it uses.

The oslo-config-validator tool from the oslo.config project helps with the task of auditing an existing configuration for outdated options. While introduced in Stein, it may be possible to run it against older releases if the API has not changed substantially. For example, to audit nova.conf using code from the stable/queens branch:

$ git clone -b stable/queens https://opendev.org/openstack/nova.git
$ cd nova
$ tox -e venv -- pip install --upgrade oslo.config # Update to the latest oslo.config release
$ tox -e venv -- oslo-config-validator --config-file etc/nova/nova-config-generator.conf --input-file /etc/nova/nova.conf

This would output messages identifying removed and deprecated options:

ERROR:root:DEFAULT/verbose not found
WARNING:root:Deprecated opt DEFAULT/notify_on_state_change found
WARNING:root:Deprecated opt DEFAULT/notification_driver found
WARNING:root:Deprecated opt DEFAULT/auth_strategy found
WARNING:root:Deprecated opt DEFAULT/scheduler_default_filters found

Once updated to match the deployed release, all the remaining options could be moved to a role configuration file used by for Kolla Ansible. However, we preferred to audit each one against Kolla Ansible templates, such as nova.conf.j2, to avoid keeping redundant options and detect any potential conflicts. Future upgrades will be made easier by reducing the amount of custom configuration compared to Kolla Ansible's defaults.

Templating also needs to be adapted from the original configuration management system. Kolla Ansible relies on Jinja which can use variables set in Ansible. However, when called from Kayobe, extra group variables cannot be set in Kolla Ansible's inventory, so instead of cpu_allocation_ratio = {{ cpu_allocation_ratio }} you would have to use a different approach:

{% if inventory_hostname in groups['compute_big_overcommit'] %}
cpu_allocation_ratio = 16.0
{% elif inventory_hostname in groups['compute_small_overcommit'] %}
cpu_allocation_ratio = 4.0
{% else %}
cpu_allocation_ratio = 1.0
{% endif %}

Configuring Kolla Ansible to use existing services

We described earlier that our migration strategy was to progressively deploy OpenStack services on the new control plane while using the existing Galera, Memcached, and RabbitMQ clusters. This section explains how this can be configured with Kayobe and Kolla Ansible.

In Kolla Ansible, many deployment settings are configured in ansible/group_vars/all.yml, including the RabbitMQ transport URL (rpc_transport_url) and the database connection (database_address).

An operator can override these values from Kayobe using etc/kayobe/kolla/globals.yml:

rpc_transport_url: rabbit://username:password@ctrl01:5672,username:password@ctrl02:5672,username:password@ctrl03:5672

Another approach is to populate the groups that Kolla Ansible uses to generate these variables. In Kayobe, we can create an extra group for each existing service (e.g. ctrl_rabbitmq), populate it with existing hosts, and customise the Kolla Ansible inventory to map services to them.

In etc/kayobe/kolla.yml:

kolla_overcloud_inventory_top_level_group_map:
  control:
    groups:
      - controllers
  network:
    groups:
      - network
  compute:
    groups:
      - compute
  monitoring:
    groups:
      - monitoring
  storage:
    groups:
      "{{ kolla_overcloud_inventory_storage_groups }}"
  ctrl_rabbitmq:
    groups:
      - ctrl_rabbitmq

kolla_overcloud_inventory_custom_components: "{{ lookup('template', kayobe_config_path ~ '/kolla/inventory/overcloud-components.j2') }}"

In etc/kayobe/inventory/hosts:

[ctrl_rabbitmq]
ctrl01 ansible_host=192.168.0.1
ctrl02 ansible_host=192.168.0.2
ctrl03 ansible_host=192.168.0.3

We copy overcloud-components.j2 from the Kayobe source tree to etc/kayobe/kolla/inventory/overcloud-components.j2 in our kayobe-config repository and customise it:

[rabbitmq:children]
ctrl_rabbitmq

[outward-rabbitmq:children]
ctrl_rabbitmq

While better integrated with Kolla Ansible, this approach should be used with care so that the original control plane is not reconfigured in the process. Operators can use the --limit and --kolla-limit options of Kayobe to restrict Ansible playbooks to specific groups or hosts.

Customising Kolla images

Even though Kolla Ansible can be configured extensively, it is sometimes required to customise Kolla images. For example, we had to rebuild the heat-api container image so it would use a different Keystone domain name: Kolla uses heat_user_domain while the existing deployment used heat.

Once a modification has been pushed to the Kolla repository configured to be pulled by Kayobe, one can simply rebuild images with the kayobe overcloud container image build command.

Deploying services on the new control plane

Before deploying services on the new control plane, it can be useful to double-check that our configuration is correct. Kayobe can generate the configuration used by Kolla Ansible with the following command:

$ kayobe overcloud service configuration generate --node-config-dir /tmp/kolla

To deploy only specific services, the operator can restrict Kolla Ansible to specific roles using tags:

$ kayobe overcloud service deploy --kolla-tags glance

Migrating resources to new services

Most OpenStack services will start managing existing resources immediately after deployment. However, a few require manual intervention from the operator to perform the transition, particularly when services are not configured for high availability.

Cinder

Even when volume data is kept on a distributed backend like a Ceph cluster, each volume can be associated with a specific cinder-volume service. The service can be identified from the os-vol-host-attr:host field in the output of openstack volume show.

$ openstack volume show <volume_uuid> -c os-vol-host-attr:host -f value
ctrl01@rbd

There is a cinder-manage command that can be used to migrate volumes from one cinder-volume service to another:

$ cinder-manage volume update_host --currenthost ctrl01@rbd --newhost newctrl01@rbd

However there is no command to migrate specific volumes only, so if you are migrating to a bigger number of cinder-volume services, some will have have no volume to manage until the Cinder scheduler allocate new volumes on them.

Do not confuse this command with cinder migrate which is designed to transfer volume data between different backends. Be advised that when the destination is a cinder-volume service using the same Ceph backend, it will happily delete your volume data!

Neutron

Unless Layer 3 High Availability is configured in Neutron, routers will be assigned to a specific neutron-l3-agent service. The existing service can be replaced with the commands:

$ openstack network agent remove router --l3 <old-agent-uuid> <router-uuid>
$ openstack network agent add router --l3 <new-agent-uuid> <router-uuid>

Similarly, you can use the openstack network agent remove network --dhcp and openstack network agent add network --dhcp commands for DHCP agents.

Live migrating instances

In addition to the new control plane, several additional compute hosts were added to the system, in order to provide free resources that could host the first batch of live migrated instances. Once configured as Nova hypervisors, we discovered that we could not migrate instances to them because CPU flags didn't match, even though source hypervisors were using the same hardware.

This was caused by a mismatch in BIOS versions: the existing hypervisors in production had been updated to the latest BIOS to protect against the Spectre and Meltdown vulnerabilities, but these new hypervisors had not, resulting in different CPU flags.

This is a good reminder that in a heterogeneous infrastructure, operators should check the cpu_mode used by Nova. Kashyap Chamarthy's talk on effective virtual CPU configuration in Nova gives a good overview of available options.

What about downtime?

While we wanted to minimize the impact on end users and their workflow, there were no critical services running on this cloud that would have needed a zero downtime approach. If it had been a requirement, we would have explored dynamically added new control plane nodes to the existing clusters before removing the old ones. Instead, it was a welcome opportunity to reinitialize the configuration of several critical components to a clean slate.

The road ahead

This OpenStack deployment is now ready to benefit from all the improvements developed by the Kolla community, which released Kolla 8.0.0 and Kolla Ansible 8.0.0 for the Stein cycle earlier this summer and Kayobe 6.0.0 at the end of August. The community is now actively working on releases for OpenStack Train.

If you would like to get in touch we would love to hear from you. Reach out to us via Twitter or directly via our contact page.

by Pierre Riteau at September 17, 2019 02:00 PM

September 16, 2019

OpenStack Superuser

Must-see Containers Sessions at the Open Infrastructure Summit Shanghai

Join the open source community at the Open Infrastructure Summit Shanghai. The Summit schedule features over 100 sessions organized by use cases including:  container infrastructure, artificial intelligence and machine learning, high performance computing, 5G, edge computing, network functions virtualization, and public, private and multi-cloud strategies. 

Here we’re highlighting some of the sessions you don’t want to miss about container infrastructure. Check out the full list of sessions from this track here

Kata Containers: a Cornerstone for Financial Grade Cloud Native Infrastructure

In 2017, the Kata Containers project was formed out of the code bases contributed by Intel Clear Containers and Hyper.SH runV. A year and a half later, the OpenStack Foundation confirmed Kata Containers as a top level OpenInfra project and became a de facto standard of open source virtualized container technology. Meanwhile, Hyper.sh joined forces with Ant Financial to build the CloudNative infrastructure for financial services based on secure containers. 

During this session, Ant Financial’s Xu Wang will focus on an introduction to Kata Containers and AntFin’s secure containers practice. 

Keystone as The Authentication Center for OpenStack and Kubernetes

With the increase of container services, cloud platforms cannot meet the needs of customers due to only providing virtual machine and bare metal services. Customers need to be able to consume all three services so a unified user management and authentication system is a necessity. H3C Technologies decided to use Keystone as their user management and authentication service. Jun Gu and James Xu from H3C will cover the following topics during this session: 

  1. Introduction to Keystone
  2. User management and authentication for K8s & OpenStack
  3. Keystone enhancement
  4. Integrated with third parties

Run Kubernetes on OpenStack and Bare Metal fast

Running Kubernetes on top of OpenStack provides high levels of automation and scalability. Kuryr is an OpenStack project that is a CNI plugin using Neutron and Octavia to provide networking for pods and services being primarily designed for Kubernetes clusters running on OpenStack machines. 

Tests were performed to check how Kuryr increases the networking performance when running Kubernetes on OpenStack when compared to using OpenShift/OVS SDN. In this session, Ramon Acedo Rodriquez from Red Hat will update the latest integrations and architecture to run Kubernetes clusters on OpenStack and bare metal. In addition, Rodriquez will discuss aspects of performance improvements using Kuryr as the SDN by showing his test results. 

 

Join the global community, November 4-6 in Shanghai for these sessions and more that can help you create a strategy to solve your organization’s container infrastructure needs.

The post Must-see Containers Sessions at the Open Infrastructure Summit Shanghai appeared first on Superuser.

by Kendall Waters at September 16, 2019 02:43 PM

Nate Johnston

Calendar Merge

I work in the OpenStack community, which is a broad confederation of many teams working on projects that together compose an open source IaaS cloud. With a project of such magnitude, there are a lot of meetings, which in the OpenStack world take place on Freenode IRC. The OpenStack community has set up an automated system to schedule, manage, and log these meetings. You can see the web front end at Eavesdrop.

September 16, 2019 01:34 AM

September 13, 2019

StackHPC Team Blog

Fabric control in Intel MPI

High Performance Computing usually involves some sort of parallel computing and process-level parallelisation using the MPI (Message Passing Interface) protocol has been a common approach on "traditional" HPC clusters. Although alternative approaches are gaining some ground, getting good MPI performance will continue to be crucially important for many big scientific workloads even in a cloudy new world of software-defined infrastructure.

There are several high-quality MPI implementations available and deciding which one to use is important as applications must be compiled against specific MPI libraries - the different MPI libraries are (broadly) source-compatible but not binary-compatible. Unfortunately selecting the "right" one to use is not straightforward as a search for benchmarks will quickly show, with different implementations coming out on top in different situations. Intel's MPI has historically been a strong contender, with easy "yum install" deployment, good performance (especially on Intel processors), and being - unlike Intel's compilers - free to use. Intel MPI 2018 still remains relevant even for new installs as the 2019 versions have had various issues, including the fairly-essential hydra manager appearing not to work with at least some AMD processors. A fix for this is apparently planned for 2019 update 5 but there is no release date for this yet.

MPI can run over many different types of interconnect or "fabrics" that are actually carrying the inter-process communications, such as Ethernet, InfiniBand etc. and the Intel MPI runtime will, by default, automatically try to select a fabric which works. Knowing how to control fabric choices is however still important as there is no guarantee it will select the optimal fabric, and fall-back through non-working options can lead to slow startup or lots of worrying error messages for the user.

Intel significantly changed the fabric control between 2018 and 2019 MPI versions but this isn't immediately obvious from the changelog and you have to jump about between the developer references and developer guides to get the full picture. In both MPI versions the I_MPI_FABRICS environment variable specifies the fabric, but the values it takes are quite different:

  • For 2018 options are shm, dapl, tcp, tmi, ofa or ofi, or you can use x:y to control intra- and inter-node communications separately (see the docs for which combinations are valid).
  • For 2019 options are only ofi, shm:ofi or shm, with the 2nd option setting intra- and inter-node communications separately as before.

The most generally-useful options are probably:

  • shm (2018 & 2019): The shared memory transport; only applicable to intra-node communication so generally used with another transport as suggested above - see the docs for details.
  • tcp (2018 only): A TCP/IP capable fabric e.g. Ethernet or IB via IPoIB.
  • ofi (2018 & 2019): An "OpenFabrics Interfaces-capable fabric". These use a library called libfabric (either an Intel-supplied or "external" version) which provides a fixed application-facing API while talking to one of several "OFI providers" which communicate with the interconnect hardware. Really your choice of provider here depends on the hardware, with possibilities being:
    • psm2: Intel OmniPath
    • verbs: InfiniBand or iWARP
    • RxM: A utility provider supporting verbs
    • sockets: Again an TCP/IB capable fabric but this time through libfabric. It's not intended to be faster than the 2018 tcp option, but allows developing/debugging libfabric codes without actually having a faster interconnnect available.

With both 2018 and 2019 you can use I_MPI_OFI_PROVIDER_DUMP=enable to see which providers MPI thinks are available.

2018 also supported some additional options which have gone away in 2019:

  • ofa (2018): "OpenFabrics Alliance" e.g. InfiniBand (through OFED Verbs) & possibly also iWARP and RoCE?
  • dapl (2018): "Direct Access Programming Library" e.g. InfiniBand and iWARP.
  • tmi (2018): "Tag Matching Interface" e.g. Intel True Scale Fabric, Intel Omni-Path Architecture, Myrinet

With any of these fabrics there are additional variables to tweak things. 2018 has I_MPI_FABRICS_LIST which allows specification of a list of available fabrics to try, plus variables to control fallback through this list. These variables are all gone in 2019 now there are fewer fabric options. Clearly Intel have clearly decided to concentrate on OFA/libfabric which unifies (or restricts, depending on your view!) the application-facing interface.

If you're using the 2018 MPI over InfiniBand you might be wondering which option to use; at least back in 2012 performance between DAPL and OFA/OFED Verbs was apparently generally similar although the transport options available varied, so which is usable/best if both are available will depend on your application and hardware.

HPC Fabrics in the Public Cloud

Hybrid and public cloud HPC solutions have been gaining increasing attention, with scientific users looking to burst peak usage out to the cloud, or investigating the impact of wholesale migration.

Azure have been pushing their capabilities for HPC hard recently, showcasing ongoing work to get closer to bare-metal performance and launching a 2nd generation of "HB-series" VMs which provide 120 cores of AMD Epyc 7002 processors. With InfiniBand interconnects and as many as 80,000 cores of HBv2 available for jobs for (some) customers, Azure looks to be providing pay-as-you-go access to some very serious (virtual) hardware. And in addition to providing a platform for new HPC workloads in the cloud, for organisations which are already embedded in the Microsoft ecosystem Azure may seem an obvious route to acquiring a burst capacity for on-premises HPC workloads.

If you're running in a virtualised environment such as Azure, MPI configuration is likely to have additional complexities and a careful read of any and all documentation you can get your hands on is likely to be needed.

For example for Azure, the recommended Intel MPI settings described here, here and in the suite of pages here vary depending on which type of VM you are using:

  • Standard and most compute-optimised nodes only have Ethernet (needing tcp or sockets) which is likely to make them uninteresting for multi-node MPI jobs.
  • Hr-series VMs and some others have FDR InfiniBand but need specific drivers (provided in an Azure image), Intel MPI 2016 and the DAPL provider set to ofa-v2-ib0.
  • HC44 and HB60 VMs have EDR InfiniBand and can theoretically use any MPI (although for HB60 VMs note the issues with Intel 2019 MPI on AMD processors mentioned above) but need the appropriate fabric to be manually set.

InfiniBand on Azure still seems to be undergoing considerable development with for example new drivers for MVAPICH2 coming out around now so treat any guidance with a pinch of salt until you know it's not stale, to mix metaphors!

---

If you would like to get in touch we would love to hear from you. Reach out to us on Twitter or directly via our contact page.

by Steve Brasier at September 13, 2019 03:30 PM

Chris Dent

Placement Update 19-36

Here's placement update 19-36. There won't be one next week, I will be away. Because of my forthcoming "less time available for OpenStack" I will also be stopping these updates at some point in the next month or so so I can focus the limited time I will have on reviewing and coding. There will be at least one more.

Most Important

The big news this week is that after returning from a trip (that meant he was away during the nomination period) Tetsuro has stepped up to be the PTL for placement in Ussuri. Thanks very much to him for taking this up, I'm sure he will be excellent.

We need to work on useful documentation for the features developed this cycle.

I've also made a now worklist in StoryBoard to draw attention to placement project stories that are relevant to the next few weeks, making it easier to ignore those that are not relevant now, but may be later.

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 23 (-1) stories in the placement group. 0 (0) are untagged. 5 (0) are bugs. 4 (0) are cleanups. 10 (-1) are rfes. 5 (1) are docs.

If you're interested in helping out with placement, those stories are good places to look.

osc-placement

  • https://review.opendev.org/666542 Add support for multiple member_of. There's been some useful discussion about how to achieve this, and a consensus has emerged on how to get the best results.

Main Themes

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting.

  • https://review.opendev.org/#/q/topic:bp/support-consumer-types This has some good comments on it from melwitt. I'm going to be away next week, so if someone else would like to address them that would be great. If it is deemed fit to merge, we should, despite feature freeze passing, since we haven't had much churn lately. If it doesn't make it in Train, that's fine too. The goal is to have it ready for Nova in Ussuri as early as possible.

Cleanup

Cleanup is an overarching theme related to improving documentation, performance and the maintainability of the code. The changes we are making this cycle are fairly complex to use and are fairly complex to write, so it is good that we're going to have plenty of time to clean and clarify all these things.

Performance related explorations continue:

One outcome of the performance work needs to be something like a Deployment Considerations document to help people choose how to tweak their placement deployment to match their needs. The simple answer is use more web servers and more database servers, but that's often very wasteful.

Other Placement

Miscellaneous changes can be found in the usual place.

There are three os-traits changes being discussed. And two os-resource-classes changes. The latter are docs-related.

Other Service Users

New reviews are added to the end of the list. Reviews that haven't had attention in a long time (boo!) or have merged or approved (yay!) are removed.

End

🐈

by Chris Dent at September 13, 2019 11:18 AM

September 10, 2019

Aptira

10th Birthday + 10% off!

Aptira 10 year birthday 10% off sale

It’s our 10th birthday – and you get the presents!

Did you know that Aptira was founded at 9 minutes past 9, on the 9th day of the 9th month, in 2009? 2009 was also the year that NASA launched the final space shuttle mission to the Hubble Telescope. Great things happened on this day, with the founding of Aptira being no exception.

Yesterday we turned 10! We wouldn’t be here if it wasn’t for our amazing customers. So to celebrate, we are offering 10% off all our services from the 10th of September until the 10th of October. That’s 10% off managed services, 10% off training, 10% off everything except hardware. This 10% discount also applies to pre-paid services, so you can pre-pay for the next 12 months to really maximise your savings!

And for the extra icing on the cake (even though it doesn’t have a 10 in it), we’ll give you a free 2 hour consulting session to help get you started with transforming your Cloud solution. Chat with a Solutionaut today to take advantage of this once in a decade discount.

The post 10th Birthday + 10% off! appeared first on Aptira.

by Jessica Field at September 10, 2019 01:00 PM

September 09, 2019

OpenStack Superuser

Must-see 5G and edge computing sessions at the Open Infrastructure Summit Shanghai

Creating an edge computing strategy? Looking for reference architectures or a vendor to support your strategy?

Join the people building and operating open infrastructure at the Open Infrastructure Summit Shanghai, November 4-6 where you will come with questions, and leave with an edge computing strategy. The Summit schedule features over 100 sessions covering over 30 open source projects organized by use cases including: artificial intelligence and machine learning, high performance computing, 5G, edge computing, network functions virtualization (NFV), container infrastructure and public, private and multi-cloud strategies.

Here we’re highlighting some of the sessions you’ll want to add to your schedule about edge computing. Check out the entire track here.

Towards Guaranteed Low Latency And High Security Service Based On Mobile Edge Computing (MEC) Technology

As a 5G pioneer, SK Telecom (SKT) developed their own MEC platform from last year. It was designed and developed to respond to a variety of business requirements, as well as complies with 3GPP/ETSI standards. To interwork with current 4G/5G technologies, SKT implemented unique edge routing technology and commercialized this MEC platform last year. The platform is linked to 5their G network and is currently providing smart factory pilot service which requires extremely low latency. This talk will provide an overview of SKT’s MEC architecture, lessons learned from commercialization and their future plans.

Secured Edge Infrastructure For Contactless Payment System

China UnionPay will discuss their StarlingX architecture and how to apply security hardening features on their underlying OpenStack and Kubernetes platform. They will describe the architecture with support of both virtual machine and container resources for the edge payment service application including face recognition, car license plate detection, payment, and more. Learn more about smart payment requirements and reference implementations that use case, including capabilities like resource management, security isolation, and more.

Network Function Virtualization Orchestration By Airship

This session will cover how to enable OVS-DPDK in Airship and demonstrate the end-to-end deployment flow of OVS-DPDK in Airship. Moreover, the speakers will present the implementation details like creating DPDK-enabled docker images for OVS, handling hugepage allocation for DPDK in OpenStack-Helm and Kubernetes, CPU pinning, and more.

Join the global community, November 4-6 in Shanghai for these sessions and more that can help you create a strategy to solve your organization’s edge computing needs.

The post Must-see 5G and edge computing sessions at the Open Infrastructure Summit Shanghai appeared first on Superuser.

by Allison Price at September 09, 2019 02:45 PM

CERN Tech Blog

Software RAID support in OpenStack Ironic

The vast majority of the ~15’000 physical servers in the CERN IT data centers rely on Software RAID to protect services from disk failures. With the advent of OpenStack Ironic to manage this bare metal fleet, the CERN cloud team started to work with the upstream community on adding Software RAID support to Ironic’s feature list. Software RAID support in Ironic is now ready to be released with OpenStack’s Train release, but (backported to Stein) the code is already in production on more than 1’000 nodes at CERN.

by CERN (techblog-contact@cern.ch) at September 09, 2019 10:15 AM

September 08, 2019

Christopher Smart

Setting up a monitoring host with Prometheus, InfluxDB and Grafana

Prometheus and InfluxDB are powerful time series database monitoring solutions, both of which are natively supported with graphing tool, Grafana.

Setting up these simple but powerful open source tools gives you a great base for monitoring and visualising your systems. We can use agents like node-exporter to publish metrics on remote hosts which Prometheus will scrape, and other tools like collectd which can send metrics to InfluxDB’s collectd listener (as per my post about OpenWRT).

Prometheus’ node exporter metrics in Grafana

I’m using CentOS 7 on a virtual machine, but this should be similar to other systems.

Install Prometheus

Prometheus is the trickiest to install, as there is no Yum repo available. You can either download the pre-compiled binary or run it in a container, I’ll do the latter.

Install Docker and pull the image (I’ll use Quay instead of Dockerhub).

sudo yum install docker
sudo systemctl start docker
sudo systemctl enable docker
sudo docker pull quay.io/prometheus/prometheus

Let’s create a directory for Prometheus configuration files which we will pass into the container.

sudo mkdir /etc/prometheus.d

Let’s create the core configuration file. This file will set the scraping interval (under global) for Prometheus to pull data from client endpoints and is also where we configure those endpoints (under scrape_configs). As we will enable node-exporter on the monitoring node itself later, let’s add it as a localhost target.

cat << EOF | sudo tee /etc/prometheus.d/prometheus.yml
global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node'
    static_configs:
    - targets:
      - localhost:9100
EOF

Now we can start a persistent container. We’ll pass in the config directory we created earlier but also a dedicated volume so that the database is persistent across updates. We use host networking so that Prometheus can talk to localhost to monitor itself (not required if you want to configure Prometheus to talk to the host’s external IP instead of localhost).

Pass in the path to any custom CA Certificate as a volume (example below) for any end points you require. If you want to run this behind a reverse proxy, then set web.external-url to the hostname and port (leave it off if you don’t).

Note that enabling the admin-api and lifecycle will be allow anyone on your network to perform those functions, so you may want to only allow that if your network is trusted. Else you should probably put those behind an SSL enabled, password protected webserver (out of scope for this post).

Note also that some volumes have either :z or :Z option appended to them, this is to set the SELinux context for the container (shared vs exclusive, respectively).

sudo docker run \
--detach \
--interactive \
--ttty \
--network host \
--name prometheus \
--restart always \
--publish 9090:9090 \
--volume prometheus:/prometheus \
--volume /etc/prometheus.d:/etc/prometheus.d:Z \
--volume /path/to/ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:z \
quay.io/prometheus/prometheus \
--config.file=/etc/prometheus.d/prometheus.yml \
--web.external-url=http://$(hostname -f):9090 \
--web.enable-lifecycle \
--web.enable-admin-api

Check that the container is running properly, it should say that it is ready to receive web requests in the log. You should also be able to browse to the endpoint on port 9090 (you can run queries here, but we’ll use Grafana).

sudo docker ps
sudo docker logs prometheus

Updating Prometheus config

Updating and reloading the config is easy, just edit /etc/prometheus.d/prometheus.yml and restart the container. This is useful when adding new nodes to scrape metrics from.

sudo docker restart prometheus

You can also send a message to Prometheus to reload (if you enabled this by web.enable-lifecycle option).

curl -s -XPOST localhost:9090/-/reload

In the container log (as above) you should see that it has reloaded the config.

Installing Prometheus node exporter

You’ll notice in the Prometheus configuration above we have a job called node and a target for localhost:9100. This is a simple way to start monitoring the monitor node itself! Installing the node exporter in a container is not recommended, so we’ll use the Copr repo and install with Yum.

sudo curl -Lo /etc/yum.repos.d/_copr_ibotty-prometheus-exporters.repo \
https://copr.fedorainfracloud.org/coprs/ibotty/prometheus-exporters/repo/epel-7/ibotty-prometheus-exporters-epel-7.repo

sudo yum install node_exporter
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

It should be listening on port 9100 and Prometheus should start getting metrics from http://localhost:9100/metrics automatically (we’ll see them later with Grafana).

Install InfluxDB

Influxdata provides a yum repository so installation is easy!

cat << \EOF | sudo tee /etc/yum.repos.d/influxdb.repo
[influxdb]
name=InfluxDB
baseurl=https://repos.influxdata.com/centos/$releasever/$basearch/stable
enabled=1
gpgcheck=1
gpgkey=https://repos.influxdata.com/influxdb.key
EOF
sudo yum install influxdb

The defaults are fine, other than enabling collectd support so that other clients can send metrics to InfluxDB. I’ll show you how to use this in another blog post soon.

sudo sed-i 's/^\[\[collectd\]\]/#\[\[collectd\]\]/' /etc/influxdb/influxdb.conf
cat << EOF | sudo tee -a /etc/influxdb/influxdb.conf
[[collectd]]
  enabled = true
  bind-address = ":25826"
  database = "collectd"
  retention-policy = ""
   typesdb = "/usr/local/share/collectd"
   security-level = "none"
EOF

This should open a number of ports, including InfluxDB itself on TCP port 8086 and collectd receiver on UDP port 25826.

sudo ss -ltunp |egrep "8086|25826"

Create InfluxDB collectd database

Finally, we need to connect to InfluxDB and create the collectd database. Just run the influx command.

influx

And at the prompt, create the database and exit.

CREATE DATABASE collectd
exit

Install Grafana

Grafana has a Yum repository so it’s also pretty trivial to install.

cat << EOF | sudo tee /etc/yum.repos.d/grafana.repo
[grafana]
name=Grafana
baseurl=https://packages.grafana.com/oss/rpm
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
EOF
sudo yum install grafana

Grafana pretty much works out of the box and can be configured via the web interface, so simply start and enable it. The server listens on port 3000 and the default username is admin with password admin.

sudo systemctl start grafana
sudo systemctl enable grafana
sudo ss -ltnp |grep 3000

Now you’re ready to log into Grafana!

Configuring Grafana

Browse to the IP of your monitoring host on port 3000 and log into Grafana.

Now we can add our two data sources. First, Prometheus, poing to localhost on port 9090

..and then InfluxDB, pointing to localhost on port 8086 and to the collectd database.

Adding a Grafana dashboard

Make sure they tested OK and we’re well on our way. Next we just need to create some dashboards, so let’s get a dashboard to show node exporter and we’ll hopefully at least see the monitoring host itself.

Go to Dashboards and hit import.

Type the number 1860 in the dashboard field and hit load.

This should automatically download and load the dash, all you need to do is select your Prometheus data source from the Prometheus drop down and hit Import!

Next you should see the dashboard with metrics from your monitor node.

So there you go, you’re on your way to monitoring all the things! For anything that supports collectd, you can forward metrics to UDP port 25826 on your monitor node. More on that later…

by Chris at September 08, 2019 12:18 PM

September 06, 2019

Nate Johnston

Joining the TC

While my candidate statement goes in to some detail about why I wanted to run for the OpenStack Technical Committee (“TC”), I wanted to write a bit more about it to explain where I am coming from and what I feel I can offer. As far as the TC is concerned, I am “new blood”. I have worked in some positions that are a part of OpenStack community stewardship before - in 2016 I spent a season as one of the election officials, and I have worked previously as an infrastructure liaison from the Neutron community.

September 06, 2019 03:26 PM

Chris Dent

Placement Update 19-35

Let's have a placement update 19-35. Feature freeze is this week. We have a feature in progress (consumer types, see below) but it is not critical.

Most Important

Three main things we should probably concern ourselves with in the immediate future:

  • We are currently without a PTL for Ussuri. There's some discussion about the options for dealing with this in an email thread. If you have ideas (or want to put yourself forward), please share.

  • We need to work on useful documentation for the features developed this cycle.

  • We need to create some cycle highlights. To help with that I've started an etherpad. If I've forgotten anything, please make additions.

What's Changed

  • osc-placement 1.7.0 has been released. This adds support for managing allocation ratios via aggregates, but adding a few different commands and args for inventory manipulation.

  • Work on consumer types exposed that placement needed to be first class in grenade to make sure database migrations are run. That change has merged. Until then placement was upgraded as part of nova.

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 24 (-1) stories in the placement group. 0 (0) are untagged. 5 (0) are bugs. 4 (0) are cleanups. 11 (-1) are rfes. 4 (0) are docs.

If you're interested in helping out with placement, those stories are good places to look.

osc-placement

  • https://review.opendev.org/666542 Add support for multiple member_of. There's been some useful discussion about how to achieve this, and a consensus has emerged on how to get the best results.

  • --amend and --aggregate on resource provider inventory has merged and been release 1.7.0 (see above).

Main Themes

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting.

Cleanup

Cleanup is an overarching theme related to improving documentation, performance and the maintainability of the code. The changes we are making this cycle are fairly complex to use and are fairly complex to write, so it is good that we're going to have plenty of time to clean and clarify all these things.

Performance related explorations continue:

One outcome of the performance work needs to be something like a Deployment Considerations document to help people choose how to tweak their placement deployment to match their needs. The simple answer is use more web servers and more database servers, but that's often very wasteful.

Other Placement

Miscellaneous changes can be found in the usual place.

  • https://review.opendev.org/676982 Merge request log and request id middlewares is worth attention. It makes sure that all log message from a single request use a global and local request id.

There are three os-traits changes being discussed. And zero os-resource-classes changes.

Other Service Users

This week (because of feature freeze) I will not be adding new finds to the list, just updating what was already on the list.

End

🐎

by Chris Dent at September 06, 2019 10:53 AM

September 05, 2019

Mirantis

OpenStack vs AWS Total Cost of Ownership: Assumptions behind the TCO Calculator

You may think you know whether OpenStack or AWS is more expensive, but it's a complicated process to decide. Here are some things you need to consider.

by Nick Chase at September 05, 2019 03:22 PM

Galera Cluster by Codership

Galera Cluster with new Galera Replication Library 3.28 and MySQL 5.6.45, MySQL 5.7.27 is GA

Codership is pleased to announce a new Generally Available (GA) release of Galera Cluster for MySQL 5.6 and 5.7, consisting of MySQL-wsrep 5.6.45-25.27 and 5.7.27-25.19 with a new Galera Replication library 3.28 (release notes, download) implementing wsrep API version 25. This release incorporates all changes into MySQL 5.6.45 (release notes, download) and MySQL 5.7.27 (release notes, download) respectively.

Compared to the previous release, the Galera Replication library has a few notable fixes: it has enhanced UUID detection, and builds on esoteric platforms to benefit distributions shipping Galera Cluster like Debian on platforms like hppa/hurd-i386/kfreebsd. The 5.7.27-25.19 release also fixes a segmentation fault (segfault) when the wsrep_provider is set to none.

This release marks the last release for OpenSUSE 13.1 as the release itself has reached End-of-Life (EOL) status. It should also be noted that the next release will mark the EOL for OpenSUSE 13.2. If you are still using this base operating system and are unable to upgrade, please contact info@codership.com for more information.

You can get the latest release of Galera Cluster from http://www.galeracluster.com. There are package repositories for Debian, Ubuntu, CentOS, RHEL, OpenSUSE and SLES. The latest versions are also available via the FreeBSD Ports Collection.

 

by Colin Charles at September 05, 2019 09:20 AM

September 04, 2019

OpenStack Superuser

Tapping into Roots to Accelerate Open Infrastructure Growth in Japan

It is always a delight to be in Tokyo.  The people, the food, the tradition… if times were different, I would seriously consider becoming an expat.  More than that, Tokyo has a vibrant and involved OpenStack community that never ceases to invigorate my feelings around open source.  You can see it in every aspect of Cloud Native Days Japan, where the concept of open infrastructure comes to life through demos and talks from companies like NEC, NTT, Red Hat, Cyber Agent, Yahoo! Japan, and Fujitsu.

This is the second event that merges OpenStack Day Japan with the CNCF’s Cloud Native Day, and the growth of the event speaks to its success.  Our host and friend Akihiro Hasegawa kicked things off, noting over 1,600 attendees made for a standing-room only keynote from:

  • Mark Collier / Chief Operating Officer, OSF
  • Noriaki Fukuyasu / VP of Japan Operations, The Linux Foundation
  • Melanie Cebula / Software Engineer, Airbnb
  • Stephan Fabel / Director of Product, Canonical Ltd.
  • Doug David / STSM, Offering Manager Knative, IBM

The key takeaway? Open infrastructure plays a critical role and will continue for companies working in the cloud.  OSF’s Mark Collier reinforced the message brought forth at the Open Infrastructure Summit Denver. “Collaboration without boundaries works, proving that it is one of the best ways to produce software,” he said.  OpenStack has had 19 on time releases and is one of the top three most active open source projects. Citing “no better example than CERN,” running Kubernetes on top of OpenStack to create one of the largest open source clouds in the world.

Stephen Fabel followed up with a timely and related talk, 10 New Rules of Open Infrastructure. Rule #1:  “Consume unmodified upstream”. “The whole point of open infrastructure is to be able to engage with the larger community for support and to create a common basis for hiring, training and innovating on your next-generation infrastructure platform.”  His message was a strong follow up to Collier’s, with a clear message around the power of open source and open infrastructure for Canonical’s customers.

Beyond the keynotes, there were packed rooms for talks on Kata Containers, Zuul CI, the future of the OpenStack community, and a deep dive into OpenStack in the financial sector from Y Jay FX.  OpenStack was well represented, along with some in depth content around Kubernetes from CNCF.

On Monday evening, participants were treated to an OpenStack Foundation birthday celebration in the marketplace (one of 25 in the world!), along with a raffle of prizes from sponsors and local businesses.  It was an amazing event in a style befitting Tokyo.  We’re extremely grateful to the OpenStack Japan User Group and Akihiro Hasegawa in particular for their continued support and efforts towards the open infrastructure community.

Want to collaborate with the global open infrastructure community? Check out the upcoming Open Infrastructure Summit Shanghai happening November 4-6 or check out the OSF events page for upcoming local meetups and Open Infrastructure Days near you!

The post Tapping into Roots to Accelerate Open Infrastructure Growth in Japan appeared first on Superuser.

by Jimmy McArthur at September 04, 2019 01:00 PM

September 03, 2019

OpenStack Superuser

Analysis of Kubernetes and OpenStack Combination for Modern Data Centers

For many telecom service providers and enterprises who are transforming their data center to modern infrastructure, moving to containerized workloads has become a priority. However, vendors often do not choose to shift completely to a containerized model. 

Data centers have to support virtual machines (VMs) as well to keep up with legacy VMs. Therefore, a model of managing virtual machines with OpenStack and containers using Kubernetes has become popular. In an OpenStack survey conducted in 2018, it was seen that 61% OpenStack deployments are also working with Kubernetes.

Apart from this, some of the recent tie-ups and releases of platforms clearly show this trend. For example:

  • AT&T’s 3 years deal with Mirantis to develop 5G core backed by Kubernetes and OpenStack,
  • Platform9’s Managed OpenStack and Kubernetes – providing required featured sets bundled in solution stack for the service provider as well as developers. They support Kubernetes on VMware platform as well.
  • Nokia’s CloudBand release – containing Kubernetes and OpenStack for workload orchestrations
  • OpenStack Foundation’s recently announced Airship project brings the power of OpenStack and Kubernetes in one framework.

The core part of a telecom network or any virtualized core of a data center has undergone a revolution, shifting from Physical Network Functions (PNFs) to Virtual Network Functions (VNFs). Organizations are now adopting Cloud-Native Network Functions (CNFs) to help bring CI/CD-driven agility into the picture. 

The journey is shown in one of the slides from the Telecom User Group session at KubeCon Barcelona in May 2019, which was delivered by Dan Kohn, the executive director of CNCF and Cheryl Hund, the director of ecosystem of CNCF.

Figure – PNFs to VNFs

Image source: https://kccnceu19.sched.com/event/MSzj/intro-deep-dive-bof-telecom-user-group-and-cloud-native-network-functions-cnf-testbed-taylor-carpenter-vulk-coop-cheryl-hung-dan-kohn-cncf

According to the slide, presently, application workloads deployed in virtual machines (VNFs) and containers (CNFs) can be managed with OpenStack and Kubernetes, respectively, on top of bare metal or any cloud. The optional part that is ONAP is a containerized MANO framework, which is managed with Kubernetes.

As discussed in birds-of-a-feather (BoF) – Telecom User Group session delivered by Kohn that –  with the progress of Kubernetes for cloud-native movement, it is expected that CNFs will be a key workload type. Kubernetes will be used to orchestrate CNFs as well as VNFs. VNFs will be segregated with KubeVirt or Virtlet or OpenStack on top of Kubernetes.

Approaches for managing workloads using Kubernetes and OpenStack

Let’s understand the approaches of integrating Kubernetes with OpenStack for managing containers and VMs.

The first approach can be a basic approach wherein Kubernetes co-exists with OpenStack to manage containers. It gives a good performance but you cannot manage unified infrastructure resources through a single pane. This causes problems associated with planning and devising policies across workloads. Also, it can be difficult to diagnose any problems affecting the performance of resources in operations.

The second approach can be running a Kubernetes cluster in a VM managed by OpenStack. This enables OpenStack-based infrastructure to leverage the benefits of Kubernetes within a centrally managed OpenStack control system. Also, it allows full feature multi-tenancy and security benefits for containers in an OpenStack environment. However, this contributes to performance lags and necessitates additional workflows to manage VMs that are hosting Kubernetes.

The third approach is an innovative one, leaning towards a completely cloud-native environment. In this approach, Kubernetes can be replaced with OpenStack to manage containers along with VMs as well. Workloads take complete advantage of hardware accelerators, Smart NICs etc. With this, it is possible to offer integrated VNS solutions with container workloads for any data center, but this demands improved networking capabilities like in OpenStack (SFC, Provider Networks, Segmentation).

Kubernetes Vs OpenStack. Is it true?  

If you look at schedule upcoming VMworld US 2019, it can be clearly seen that Kubernetes will be everywhere. There will be 66 sessions and some hands-on training that will focus only on Kubernetes integration in every aspect of IT infrastructure.

But is that end of OpenStack? No. As we have already seen, the combination of both systems will be a better bet for any organization that wants to stick with traditional workloads while gradually moving to a new container-based environment.

How Kubernetes and OpenStack are going to combine?

I came across a very decent LinkedIn post by Michiel Manten. He stated that there are downfalls for both containers and VMs. Both have their own use cases and orchestration tools. OpenStack and Kubernetes will complement each other if properly combined to run some of the workloads in VMs to get isolation benefits within a server and some are in containers. One way to achieve this combination is to run Kubernetes clusters within VMs in OpenStack, which eliminates the security pitfalls of containers while leveraging the reliability and resiliency of VMs.

What are the benefits?

  • Combining systems will immediately benefit all current workloads so that enterprises can start their modernization progress, maintaining high speed much lower cost than commercial solutions.
  • Kubernetes and OpenStack can be an ideal and flexible solution for any form of a cloud or new far-edge cloud where automated deployment, orchestration, and latency will be the concern.
  • All workloads will be in a single network in a single IT ecosystem. This makes it easier to apply high-level network and security policies.
  •  OpenStack supports most enterprise storage and networking systems in use today. Running Kubernetes with and on top of OpenStack enables a seamless integration of containers into your IT infrastructure. Whether you want to run containerized applications bare metal or VMs, OpenStack allows you to run containers the best way for your business.
  •  Kubernetes has self-healing capabilities for infrastructure. As it is integrated into an OpenStack, it can enable easy management and resiliency to failure of core services and compute nodes.
  • A recent 19th release of OpenStack software (OpenStack Stein) has several enhancements to support Kubernetes in the stack. A team behind OpenStack Certified Kubernetes installer made it possible to deploy all containers in a cluster within 5 minutes regardless of the number of nodes. It was previously 10-12 minutes. With this, we can launch a very large-scale Kubernetes environment in 5 minutes.

Telecom service providers who have taken steps towards 5G agreed upon the fact that a cloud-native core is imperative for a 5G network. OpenStack and Kubernetes are mature, open-source operating and orchestration frameworks today. Providing agility is the key capability of Kubernetes for data centers and OpenStack has several successful projects for focusing on storage and networking of workloads, and support for myriad applications.

About the author

Sagar Nangare is a technology blogger, focusing on data center technologies (networking, telecom, cloud, storage) and emerging domains like edge computing, IoT, machine learning, AI). He works at Calsoft Inc. as a digital strategist.

Photo // CC BY NC

The post Analysis of Kubernetes and OpenStack Combination for Modern Data Centers appeared first on Superuser.

by Sagar Nangare at September 03, 2019 01:00 PM

August 31, 2019

Ghanshyam Mann

OpenStack CI/CD migration from Ubuntu Xenial -> Bionic (Ubuntu LTS 18.04)

Ubuntu Bionic (Ubuntu LTS 18.04) was released on April 26, 2018 but OpenStack CI/CD and its all gate jobs were running on Ubuntu Xenial version. We have to migrate the OpenStack to Ubuntu Bionic at this time and make sure we test all ur gate jobs with Ubuntu Bionic image.

Jens Harbott (frickler) and I started this task in December 2018 in two phases. Phase 1st to migrate the zuulv3 devstack based jobs and 2nd phase to migrate the legacy zuulv2 jobs.

    What is the meaning of migration:

OpenStack CI/CD is implemented with Zuul jobs prepare the node to deploy the OpenStack using Devstack and run tests (Tempest or its plugins, project in-tree tests, rally tests etc). Base OS installed on node is what where OpenStack will be deployed by DevStack.

Till OpenStack Rocky release, base OS on node was Ubuntu Xenial. So DevStack will deploy OpenStack on Ubuntu Xenial and then running tests to make sure every project governed by OpenStack work properly on Ubuntu Xenial.

With the new version of Ubuntu Bionic, the node base OS has been moved from Ubuntu Xenial -> Ubuntu Bionic. The same way it will make sure the OpenStack CI/CD with Ubuntu Bionic. On every code change, it will make sure OpenStack work properly on Ubuntu Bionic.

    Goal:

The end goal is to migrate all the OpenStack projects gate jobs testing from Ubuntu Xenial to Ubuntu Bionic and make sure OpenStack works fine on Bionic.

As OpenStack support stable branch also, all the jobs running till OpenStack Rocky gate will be on Ubuntu Xenial and all jobs running from OpenStack Stein onwards will be on Ubuntu Bionic.

    Phase1:

It started with devstack based zuul v3 jobs first

  • https://etherpad.openstack.org/p/devstack-bionic

devstack base job nodeset has been switched to Bionic and before we merge the devstack patch, we had DNM testing

patches on each project side to make sure we do not break anyone. Few projects faced the issue and fixed before devstack patch merge.

You can check more details on ML.

    Phase2:

After finishing the devstack zuul v3 native jobs, we have to move all legacy jobs also to Bionic. Most of the projects are still using legacy jobs and as per Stein PTI we need to move to next python version (py3.6 and py3.7 in Train).

Work is tracked in https://etherpad.openstack.org/p/legacy-job-bionic

legacy-base and legacy-dsvm-base job nodeset was moved to Bionic and before it merge I have pushed DNM testing patch for all the projects.

Step 1. Push the testing DNM patch on your project owned repos with:

– Depends-On: https://review.openstack.org/#/c/641886/

-Remove or change the nodeset to Bionic from your repo owned legacy jobs (if any of the legacy jobs has overridden the nodeset from parent job) Example: https://review.openstack.org/#/c/639017

Step 2. If you have any legacy job not derived from base jobs then, you need to migrate their nodeset to bionic which are defined in – https://review.openstack.org/#/c/639018/ Example – https://review.openstack.org/#/c/639361/

Step3. if any of the jobs start failing on bionic then, either fix it before the deadline of March 13th or make failing job as n-v and fix later.

You can check more details on ML.

Below Diagram gives a quick glance of base image nodeset using Bionic

If you want to verify the nodeset used in your zuul jobs, you can see the hostname and label in job-output.txt

 

    Migrate the third-party CI to Bionic:

The same way you can migrate your third-party CI also to Bionic. If third-party job is using the base job without overriding the ‘nodeset’ then job is automatically switched to Bionic. If job overrides the ‘nodeset’ then, you can use the Bionic node to test on Bionic. Third-party CI jobs are not migrated as part of upstream migration.

    Completion Summary Report:

– Started this migration very late in stein (in Dec 2018)

– Finished migration in 2 part. 1. zuulv3 native job migration, 2. legacy jobs migration.

– networking-midonet bionic jobs are n-v because it is not yet ready on bionic.

– ~50 patches merged to migrate the gate jobs

– ~ 60 DNM testing patches done before migration happens

– We managed almost zero downtime in the gate. Except for Cinder which was for 1 day.

I sent the summary of this work on OpenStack ML also- here with all references.

by Ghanshyam Mann at August 31, 2019 07:00 PM

August 30, 2019

Aptira

Apigee API Translation

One of the challenges we’ve faced recently involved an API translation mechanism required to perform API translations among different component’s native APIs, delivering a response as per a pre-determined set of requirements.


The Challenge

API translation between different solution components is always required for an application to run. Integrating many different software components or products can be relatively easy if they can communicate with each other through a single communication channel. Thus, having a single API gateway which can be used by all other components for seamless communication among them is always a win-win situation.

One of our customers wanted to expose services as set of HTTP endpoints. This was required so that client application developers can make HTTP requests to these endpoints. Depending on the endpoint, the service might then return data, formatted as XML or JSON, back to the client app. Also, content mapping was one of their major requirements i.e. Modifying the Input data to a Rest API on the fly and then extracting the desired values from the response as per the requirement.

Because the customer wanted to make their service available to the web, they wanted to make sure that all necessary steps have been taken in order to secure and protect their services from unauthorized access. Also, it can be easily consumed by other apps/components, enabling them to change the backend service implementation without affecting the public API.


The Aptira Solution

Google Cloud’s Apigee API Platform has been selected for this project due to its extensive set of features that satisfied customer requirements such as - Rate Limiting, Data translation and flexible deployment options and the API Portal.

We deployed the On-Premises version of Apigee in a Private Cloud environment and then created an Apigee proxy for API translation as per Telemanagement Forum (TMF). This Proxy used Apigee’s multiple inbuilt policies for translating the Cloudify API as per TMF.

  • The Proxy’s Preflow uses Extract message policies to in-turn extract the Cloudify blueprint ID from input JSON. This is further used by the Service callout policy to create the cloudify deployment
  • Once the deployment is complete, a deployment ID is extracted using Extract message policy
  • This is further followed by a service callout policy to start the deployment execution
  • Once the execution has been finalised, an Assign Message policy is used to create an NBI service order as per TMF standards using the deployment ID which was generated earlier

The Result

Apigee has enabled us to perform API translations among different component’s native APIs, delivering a response as per a pre-determined set of requirements.


OTHER APIGEE CASE STUDIES

Let us make your job easier.
Find out how Aptira's managed services can work for you.

Find Out Here

The post Apigee API Translation appeared first on Aptira.

by Aptira at August 30, 2019 01:28 PM

Chris Dent

Placement Update 19-34

Welcome to placement update 19-34. Feature Freeze is the week of September 9th. We have features in progress in placement itself (consumer types) and osc-placement that would be great to land.

Most Important

In addition to the features above, we really need to get started on tuning up the documentation so that same_subtree and friends can be used effectively.

It is also time to start thinking about what features, if any, need to be pursued in Ussuri. If there are few, that ought to leave time and energy for getting the osc-placement plugin more up to date.

And, there are plenty of stories (see below) that need attention. Ideally we'd end every cycle with zero stories, including removing ones that no longer make sense.

What's Changed

  • Tetsuro has picked up the baton for performance and refactoring work and found some improvements that have merged. There's additional work in progress (noted below).

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 25 (2) stories in the placement group. 0 (0) are untagged. 5 (1) are bugs. 4 (0) are cleanups. 12 (1) are rfes. 4 (0) are docs.

If you're interested in helping out with placement, those stories are good places to look.

osc-placement

osc-placement is currently behind by 12 microversions.

  • https://review.opendev.org/666542 Add support for multiple member_of. There's been some useful discussion about how to achieve this, and a consensus has emerged on how to get the best results.

  • https://review.opendev.org/640898 Adds a new --amend option which can update resource provider inventory without requiring the user to pass a full replacement for inventory and an --aggregate option to set inventory on all the providers in an aggregate. This has been broken up into three patches to help with review. This one is very close but needs review from more people than Matt.

Main Themes

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting.

I picked this up yesterday and hope to have it finished next week, barring distractions. I figure having it in place for nova for Ussuri is a nice to have.

Cleanup

Cleanup is an overarching theme related to improving documentation, performance and the maintainability of the code. The changes we are making this cycle are fairly complex to use and are fairly complex to write, so it is good that we're going to have plenty of time to clean and clarify all these things.

Performance related explorations continue:

One outcome of the performance work needs to be something like a Deployment Considerations document to help people choose how to tweak their placement deployment to match their needs. The simple answer is use more web servers and more database servers, but that's often very wasteful.

Discussions about using a different JSON serializer terminated choosing not to use orjson because it presents some packaging and distribution issues that might be problematic. There's still an option to use one of the other alternatives, but that exploration has not started.

Other Placement

Miscellaneous changes can be found in the usual place.

  • https://review.opendev.org/676982 Merge request log and request id middlewares is worth attention. It makes sure that all log message from a single request use a global and local request id.

There are two os-traits changes being discussed. And zero os-resource-classes changes.

Other Service Users

New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed.

End

by Chris Dent at August 30, 2019 09:48 AM

August 29, 2019

Aptira

Implementing TMF APIs in Apigee

Aptira Apigee TMF APIs

A large Telco is building an Orchestration platform to orchestrate workloads that will be deployed in their Private Cloud infrastructure and spread across multiple data centers. Their internal systems must also integrate with the platform, requiring the implementation of TMF APIs.


The Challenge

The customer’s internal environment includes a large set of systems, including Product Ordering, operations support systems (OSS), business support systems (BSS) and catalog systems that are functionally North-bound systems to the Orchestration platform. To effectively integrate these systems, the platform requires implementation of North bound Interfaces (NBI) between the Orchestration platform and those systems.

The core challenge was to ensure that requests for Customer Facing Services (CFS) are translated into necessary actions on the Resource Facing Services (RFS) or Network service instances. These instances are managed by Cloudify in the NFVi Domain to implement the required service changes.


The Aptira Solution

In order to support and enable the customer’s roadmap of transforming these IT systems and to enable seamless integration of these systems with the orchestration platform, a common API framework using standard Telemanagement Forum (TMF) APIs was proposed.

TMF defines a set of guidelines, standardizing the way IT systems interact to manage and deliver complex services such as Customer Facing Service (CFS) and Resource Facing Service (RFS). It defines a suite of APIs (based on the OpenAPI standard) that defines the specifications and data model for each type of function that is invoked by IT systems. Examples include Service Order and Activation, Service Catalog, Service Qualification.

Aptira’s design for the North Bound Interface (NBI) function was to use Google Cloud’s Apigee API Platform as the implementation platform, and to make use of its various configuration and customization capabilities. Apigee provides extensive API mapping and logging functions that they call policies. These policies can then be applied to each API call that transits through the platform. In addition to this, there are multiple options for more extensive customisation through code development.

To handle the TMF API requests from the NBI systems in the Orchestration platform, APIs were implemented in Apigee in following stages, for each TMF-specified function call that need to be handled from north bound systems:

Stage 1

  • An API proxy endpoint is created in Apigee using the Apigee administration portal
  • Security policies to enable communication between Apigee and NBI systems are configured
  • A business workflow is designed by defining actions that need to be triggered based on the TMF APIs data model
  • Using Apigee’s language constructs available in the developer portal, a policy is defined that extracts the parameters from the payload of the requests
  • Each APIs data model is interpreted in a specific way, creating a workflow policy

For example: To initiate a Heal operation on a network service, the IT systems trigger TMF API 664 Resource Activation and Configuration by specifying the Heal action. The request payload has the following parameters:

  • resourceFunction – Identifier of Network service to be healed
  • healPolicy – A set of custom scripts/policy to trigger healing
  • plus additional parameters

Stage 2

  • Once the TMF API payload has been extracted and their parameters are interpreted (based on the workflow defined in Step 1) actions are triggered by making an API call. Apigee adds the Network identifier parameters extracted in Step 1 in the API payload to the south bound systems.
  • Each NBI API call is converted into a Southbound Interface API call to the relevant system that will implement the requests. In this case, the main southbound systems were Cloudify Orchestration.

For example: the network identifier in Step 1 identifies the exact deployment instances in the underlying infrastructure environment. Apigee invokes a Cloudify API call to trigger the specific action such as Heal on the network resource.

Since all the operations are designed to be asynchronous, Apigee maintains a transaction state and waits for a response from Cloudify for a completion of the heal action. Once the heal action is completed, a response is sent back to the originating Northbound API using the same API proxy endpoint.


The Result

Utilising Apigee’s API translation mechanism, we were able to demonstrate all the customer use cases that involve NBI system integration with the platform.

The following TMF APIs were implemented:

  • TMF 641 Service Activation and Ordering
  • TMF 633 Service Catalog
  • TMF 664 Resource Function Activation and Configuration

As a result of this exercise, API developers now have a better idea of the extensive set of constructs in Apigee which APIs can be built to develop applications across any technology domain.


OTHER APIGEE CASE STUDIES

Become more agile.
Get a tailored solution built just for you.

Find Out More

The post Implementing TMF APIs in Apigee appeared first on Aptira.

by Aptira at August 29, 2019 01:03 PM

August 28, 2019

OpenStack Superuser

A Global Celebration “For The Love of Open!”

Since July 2010, the global community has celebrated the OpenStack project’s birthday. User Groups all over the world have hosted celebrations around July – August, presenting slide decks, eating cupcakes, and spending time with local community members to commemorate this milestone.

Now that the OpenStack Foundation family has grown with the addition of Zuul, Kata, StarlingX, and Airship, we’ve timed the annual celebration around the establishment of OSF, which was in July 2012, and invited the entire OSF family to celebrate! This year, we’re celebrating quite a few milestones:

  • 105,000 members in 187 countries from 675 organizations, making OSF one of the largest global open source foundations in the world
  • OpenStack is one of the top 3 most active open source projects, with over 30 OpenStack public cloud providers around the world and adoption by global users including Verizon, Walmart, Comcast, Tencent and hundreds more
  • Kata Containers is supported by infrastructure donors including Google Cloud, Microsoft, Vexxhost, AWS, PackageCloud, and Packet.com
  • Zuul adoption has accelerated with speaking sessions and case studies by BMW, leboncoin, GoDaddy, the OpenStack Foundation and more
  • StarlingX adoption by China UnionPay, a leading financial institution in China, who will be sharing their use case at the Shanghai Summit
  • Airship elected Technical and Working Committees, and has received ecosystem support from companies including AT&T, Mirantis and SUSE

Photo: Korea User Group

25 User Groups from 20 different countries all over the world celebrated:
Missed your local group’s celebration? Stay in touch and join our OSF Meetup network!

China Open Infrastructure
DRCongo OpenStack Group
Indonesia OpenStack User Group
Japan OpenStack User Group
Korea User Group
OpenInfra Lowry Saxon
Open Infrastructure LA
Open Infrastructure Mexico City
Open Infrastructure San Diego
OpenStack Austin, Texas
OpenStack Bangladesh
OpenStack Benin User Group
OpenStack Bucharest, Romania Meetup
OpenStack Côte d’Ivoire
OpenStack Ghana User Group
OpenStack Guatemala User Group
OpenStack Malaysia User Group
OpenStack Meetup Group & SF Bay Cloud Native Containers
OpenStack Nigeria User Group
OpenStack Thailand User Group
OpenStack & OpenInfra Portland, Oregon
OpenStack & OpenInfra Russia Moscow
Tunisia OpenStack User Group
Vietnam User Group
Virginia OpenStack User Group

Photo: Indonesia OpenStack User Group

The User Groups gathered in a variety of sizes, with the largest attracting 200 attendees in Indonesia. Community members gave presentations, handed out awards, printed t-shirts and stickers, and sang birthday songs. From the pictures and feedback received, everyone thoroughly enjoyed their celebrations.

Thank you to all the organizers of the User Groups for bringing your local communities together. To see how other User Groups celebrated, check out the pictures on twitter and flickr.

If you’re looking to join the next meetup in your area, connect with your local User Group here.

Be sure to read up on other blogs and articles written by the Bangladesh and China User Groups.

The post A Global Celebration “For The Love of Open!” appeared first on Superuser.

by Ashlee Ferguson and Ashleigh Gregory at August 28, 2019 03:54 PM

Aptira

Apigee Central Logging

Aptira Apigee Central Logging

Completing a full-stack Private Cloud Evaluation is no mean feat. Central Logging by capturing and correlating information from all components, trapping and logging all required data, all without completing a significant amount of custom development – this is where innovative ideas are made.


The Challenge

One of the key success criteria for this customer’s full stack Private Cloud Evaluation was the detailed instrumentation of all components, providing fine-grained visibility on the interworking of all components in the solution on each test use-case. This required capturing all API calls, the data that was passed over each call, and the resulting responses. There were as many as 10 external integration points and multiple inter-component interfaces across which logs had to be captured and then corelated.

Aptira needed to identify a mechanism to trap these API calls and to log the required data.

The components used in this evaluation included OpenStack, Cloudify, OpenDayLight SDN Controller and TickStack. Although each component had its own logging mechanism, capturing the data flow between these different components with single logging mechanism was difficult. It looked like a significant amount of custom development would be required.


The Aptira Solution

Aptira’s Solutionauts came up with an innovative idea: use the API Gateway component of the solution to implement this central logging capability. This approach would remove the need for significant custom development and avoid the introduction of tools that were only used for the evaluation and had no place in a production environment.

The API gateway used in the solution was the on-premises version of Google Cloud’s Apigee API Platform, so we had all the capability we needed to implement this idea.

Apigee was already configured to manage the external APIs, and we were able to configure Apigee to manage the integration point between multiple interconnected components. Multiple Apigee proxies were created and deployed at all the integration points across the solution. Native APIs of all the components were integrated as backend service endpoints for the APIs managed by Apigee. Apart from using Apigee’s functionality – API rate limiting, API translation – we extended the Apigee’s logging policy capabilities as a central logging mechanism. This enabled us to capture all the required interface logs across all the components which we then used for the purpose of monitoring and performance analysis.

The power and capability of Apigee provided all the features we needed to implement the desired central logging functionality. All the required data was captured by invoking Apigee’s REST APIs with no involvement of 3rd party custom interfaces.


The Result

Once fully implemented, this central logging mechanism operated smoothly in parallel with the functional API calls that occurred when the system was operating and performing the evaluation use cases and we were able to successfully verify the operation of all use case functions.


OTHER APIGEE CASE STUDIES

Take control of your Cloud.
Get a customised Cloud strategy today.

Learn More

The post Apigee Central Logging appeared first on Aptira.

by Aptira at August 28, 2019 01:06 PM

August 27, 2019

Aptira

Apigee: On-Prem Vs SaaS

A large APAC Telco is building an Orchestration platform to orchestrate workloads that will be deployed in their Private Cloud infrastructure and spread across multiple data centers. Apigee can make this large project relatively simple – but which version is better suited? On-Premises or SaaS?


The Challenge

In order to efficiently Orchestrate such large workloads, the customer has requested a common API layer to control and manage traffic between multiple systems and the Orchestration platform. These systems include Operations Support Systems (OSS), Business Support Systems (BSS), Analytics, Product Ordering systems and a WAN Controller. They would also like to expose certain API’s to external partners via a web-based API Portal, and have a long list of feature requirements, including: Rate Limiting, Data translation and flexible deployment options and the API Portal.


The Aptira Solution

We have selected Apigee for this project due to its extensive set of features that satisfied the customer requirements. Aptira designed a deployment architecture for Apigee, taking into consideration the volume of API traffic from the many integration points, Tenancy, Security and Networking.

Apigee out of the box supports 2 types of deployment – Software-as-a-Service (SaaS) and the On-Premises version. The SaaS version satisfied most of the customer requirements and reduced total cost of ownership. However, the design had some major complexities which needed to be addressed.

The first complexity is the platform integration over their corporate network. In other words, the data traffic from the SaaS instance to the orchestration platform had to be sent over a secure VPN tunnel. Which, depending on the customer’s environment may include multiple systems/hops. This would cause a significant impact on the API response time.

Secondly, the customer has defined a set of regulatory compliance requirements to be validated for the whole orchestration platform. These requirements are often driven by the government organizations to host their workloads. These workloads would often involve software systems to be integrated with customer systems (hardware equipments or software) which are often hosted within customer’s environment. Such integration is easier to manage in On-Prem versions by customizing deployments by using 3rd party components for integration. Also, SaaS software versions are designed using standard security mechanisms, a compliance to these requirements would require customization in the software components. The problem magnifies if there are Multi-tenant workloads are to be hosted which would increase the customization effort. This in turn would introduce dependency on the vendor and software’s release cycle.

To overcome these two major complexities, Aptira decided to use the On-Premises version of Apigee. The On-Premises version includes an automated mechanism to deploy its sub-systems. This provided control of the infrastructure resources on which they are deployed and allowed fine tuning of resources to host its sub-systems according to the API traffic needs.

Apigee’s automated deployment mechanism has provided complete control over its deployment and the configuration of sub-systems. It is relatively easy to make any customizations to the software components should any new requirements arise since it doesn’t involve a vendor. It is also easier to integrate with co-located systems since the data transfer over the internal network is much faster thereby reducing the API response time.

The benefits of the on-prem deployment of Apigee are balanced against some additional considerations that are absent in the SaaS version. For example, Operations and Maintenance, Resource allocation and Validation. However, the customer had a strong preference for the On-Premises version as they had already completed an independent assessment of the technology for their requirements. Therefore, we could assume that they had already accepted these overheads.

From an integration point of view, we integrated Apigee with Orchestration specific platform systems and the customers environment systems:

  • Cloudify: Service Orchestrator/NFVO
  • TICKStack: event management Analytics engine
  • WAN SDN controller
  • OSS/BSS (Simulated using POSTMAN)

For each integration point, an API proxy endpoint has been created by taking into consideration the security policies that each API endpoint requires. With automation tools in place it is easier to maintain the software and handle operations such as upgrades and disaster recovery. Also, with proper capacity planning and budgeting most of the additional considerations can be adequately handled.

As this project is relatively new for the customer, their team came onboard quickly, seamlessly integrated with the Aptira’s project collaboration processes, and addressed each requirement in the solution space, thereby helping us resolve the queries faster during the design phase. It is also worth noting that the support Aptira received from Apigee staff has been extremely beneficial in providing the required outcome in a timely fashion for this solution.


The Result

Aptira designed the On-Prem Apigee deployment meeting all customer requirements and taking into account all considerations mentioned above. The design not only had seamless integration between all systems using the API gateway’s mechanism but also required minimal changes in the customer’s environment.

Aptira implemented a full-stack solution configuration with Apigee as the system-wide API Gateway that enabled its capabilities to be validated by live execution of telco workloads.


OTHER APIGEE CASE STUDIES

Take control of your Cloud.
Get a customised Cloud strategy today.

Learn More

The post Apigee: On-Prem Vs SaaS appeared first on Aptira.

by Aptira at August 27, 2019 01:43 PM

Galera Cluster by Codership

Galera Cluster hiring for Quality Assurance Engineer

Do you think Quality Assurance (QA) is more than the simplistic view of bug hunting? Do you believe that QA is important to the entire software development lifecycle and want to focus on failure testing, process and performance control, as well as best practice adoption? Do you enjoy doing performance benchmarks and noticing regressions? Do you like to write about it, from internal reports to external blog posts?

Then why not take up a challenge at Codership, the makers of Galera Cluster, as we are looking for a Galera Cluster QA Engineer (job description at link).

We’re looking for someone who is able to work remotely, join a company meeting at least once per year, be comfortable with the use of Slack and email (asynchronous communication, for developing our virtually synchronous replication solution!), but most importantly enjoy testing the application with a methodical approach. You will also get to verify bugs reported by users. And let us not forget, this job requires good knowledge of MySQL and MariaDB Server, where we have our replication layer in.

Please send your CV to jobs@galeracluster.com. Looking forward to hearing from you!

by Colin Charles at August 27, 2019 08:45 AM

August 26, 2019

Ed Leafe

Moving On

It’s been a great run, but my days in the OpenStack world are coming to an end. As some of you know already, I have accepted an offer to work for DataRobot. I only know bits and pieces of what I will be working on there, but one thing’s for sure: it won’t be on … Continue reading "Moving On"

by ed at August 26, 2019 03:06 PM

Aptira

Apigee Service Orchestration and Integration

Aptira Apigee Service Orchestration

The End-to-End Orchestration (E2e) of Services is achieved when the lifecycle events of network services are managed at the infrastructure level and managed across multiple domains. This customer requires lifecycle orchestration of services across multiple vendor specific OSS/BSS systems.


The Challenge

One of our Communication Service Provider (CSP) customers required E2E Orchestration of Network services across different technology domains, including integration with their external Business system. Since each OSS/BSS system is implemented with its own set of interfaces, there are significant challenges when integrating multiple systems with a common orchestration platform. For this reason, a common API layer is required to handle such patterns.


The Aptira Solution

The key components of the Orchestration platform being developed include:

  • Network Functions Virtualisation Orchestrator (NFVO)
    • The NFVO is responsible for handling the resource orchestration of Network service modelling using TOSCA across Network Functions Virtualisation Infrastructure (NFVI) domains. NFVO has the visibility of all the South bound components and manages the lifecycle of services at the infrastructure level. I.e. at Virtual IP Multimedia System (VIM) and Software Defined Networking (SDN). However, it doesn’t have information across technology domains. In this customer’s platform, the NFVO is implemented using Cloudify.
  • Service Orchestrator (SO)
    • The SO handles the E2E orchestration of network services across different technology domains such as Wireline, Radio, Evolved Packet Core, Access Networks by integrating with Product ordering systems, OSS/BSS from different vendors that support different interfaces such as REST, SOAP or any proprietary interfaces. In this customer’s platform the SO is implemented using Cloudify.

The SO and NFVO communicate with each other extensively, but they also must integrate with multiple external systems. By convention, interfaces with OSS/BSS and other service-level systems are called North Bound Interfaces or NBI. Similarly, interfaces with network-level resource management systems such as network elements, WAN SDN controllers and the like are called South bound Interfaces or SI.

Another of the customers’ requirements is that access to some limited parts of system functionality must be available to third parties. They wanted to expose public API’s via an API Portal and allow trusted / qualified third parties to access API documentation and Sandpit environments.

This all leads to a significant number of API’s that are implemented in any orchestration platform. To seamlessly integrate orchestration platform with external systems and expose public API’s in a controlled manner, a common API management layer is required.

An API Management layer mediates between multiple integrated systems that communicate via application calls. Aptira determined that Google Cloud’s Apigee API Platform was the right product to meet these requirements.

Apigee is used as an API gateway in the orchestration platform. In addition to its standard set of features such as API rate limiting, Security handling and API analytics, it comes with rich set of language constructs to create very specialized API transforms to mediate between the API’s of different systems. The NBI in this solution implements standard TM Forum Open APIs. The core of the API layer is implemented using Apigee.

To demonstrate an end-to-end orchestration use case, we simulated an environment where Apigee is integrated with Cloudify (as NFVO) by defining business workflows in Apigee that translate the TMF API calls from North bound systems to NFVI domain specific orchestration API calls. For instance, orchestration of Firewall Service and a core telco vIMS service was demonstrated using a single TMF Service Order and Activation API.

The following APIs were implemented:

  • TMF 641 Service Activation and Ordering
  • TMF 633 Service Catalog
  • TMF 664 Resource Function Activation and Configuration

The Southbound Interfaces of the Service Orchestrator are represented by the Orchestration layer at the NFVi domain (i.e. NFVO) and at the Transport domain (i.e. WAN SDN controller). In order to demonstrate an end-to-end orchestration use case that involves the Transport domain, we setup a WAN topology using asset of OVS switches, integrating Cloudify in the SO layer with a WAN SDN Controller.

The Ordering systems adds details including service level details, SLAs and waypoints in the TMF 641 request to setup a WAN service such as VPN or MPLS service. Apigee, using the workflow mechanism described above, translates the API request to the Orchestration request to the Cloudify in SO layer by adding the required parameters to instantiate a VPN service.


The Result

The result is an API framework that can be extended to develop other complex business use cases so that it not only orchestrates services at the infrastructure level but also makes integration with systems such as product ordering systems seamless.

The Apigee API platform was able to handle both generic API management tasks but also was able to implement deeply specialized telco requirements. This ultimately helps the customer to roll out services faster, meeting their business objectives and allowing them to rapidly adapt to changes in the future.


OTHER APIGEE CASE STUDIES

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Apigee Service Orchestration and Integration appeared first on Aptira.

by Aptira at August 26, 2019 01:12 PM

August 22, 2019

Galera Cluster by Codership

Running Galera Cluster on Microsoft Azure and comparing it to their hosted services (EMEA and USA webinar)

Do you want to run Galera Cluster in the Microsoft cloud? Why not learn to setup a 3-node Galera Cluster using Microsoft Azure Compute Virtual Machines, and run it yourself. In this webinar, we will cover the steps to do this, with a demonstration of how easy it is for you to do.

In addition, we will cover why you may want to run a 3-node (or more) Galera Cluster (active-active multi-master clusters) instead of (or in addition to) using Azure Database for MySQL or MariaDB. We will also cover cost comparisons. 

Join us and learn about storage options, backup & recovery, as well as monitoring & metrics options for the “roll your own Galera Cluster” in Azure.

EMEA webinar 10th of September 1-2 PM CEST (Central European Time)
JOIN THE EMEA WEBINAR

USA webinar 10th of September 9-10 AM PDT (Pacific Daylight Time)
JOIN THE USA WEBINAR

Presenter: Colin Charles, Galera Cluster Chief Evangelist, Codership


by Sakari Keskitalo at August 22, 2019 11:40 AM

August 16, 2019

Chris Dent

Placement Update 19-32

Here's placement update 19-32. There will be no update 33; I'm going to take next week off. If there are Placement-related issues that need immediate attention please speak with any of Eric Fried (efried), Balazs Gibizer (gibi), or Tetsuro Nakamura (tetsuro).

Most Important

Same as last week: The main things on the Placement radar are implementing Consumer Types and cleanups, performance analysis, and documentation related to nested resource providers.

A thing we should place on the "important" list is bringing the osc placement plugin up to date. We also need to discuss what would we would like the plugin to be. Is it required that it have ways to perform all the functionality of the API, or is it about providing ways to do what humans need to do with the placement API? Is there a difference?

We decided that consumer types is medium priority: The nova-side use of the functionality is not going to happen in Train, but it would be nice to have the placement-side ready when U opens. The primary person working on it, tssurya, is spread pretty thin so it might not happen unless someone else has the cycles to give it some attention.

On the documentation front, we realized during some performance work last week that it easy to have an incorrect grasp of how same_subtree works when there are more than two groups involved. It is critical that we create good "how to use" documentation for this and other advanced placement features. Not only can it be easy to get wrong, it can be challenge to see that you've got it wrong (the failure mode is "more results, only some of which you actually wanted").

What's Changed

  • Yet more performance fixes are in the process of merging. Most of these are related to getting _merge_candidates and _build_provider_summaries to have less impact. The fixes are generally associated with avoiding duplicate work by generating dicts of reusable objects earlier in the request. This is possible because of the relatively new RequestWideSearchContext. In a request that returns many provider summaries _build_provider_summaries continues to have a significant impact because it has to create many objects but overall everything is much less heavyweight. More on performance in Themes, below.

  • The combination of all these performance fixes, and because of microversions, makes it reasonable for anyone running placement in a resource constrained environment (or simply wanting things to be faster) to consider running Train placement with any release of OpenStack. Obviously you should test it first, but it is worth investigating. More information on how to achieve this can be found in the upgrade to stein docs

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 23 (1) stories in the placement group. 0 (0) are untagged. 4 (1) are bugs. 4 (0) are cleanups. 11 (0) are rfes. 4 (0) are docs.

If you're interested in helping out with placement, those stories are good places to look.

osc-placement

osc-placement is currently behind by 12 microversions.

  • https://review.opendev.org/666542 Add support for multiple member_of. There's been some useful discussion about how to achieve this, and a consensus has emerged on how to get the best results.

  • https://review.opendev.org/640898 Adds a new '--amend' option which can update resource provider inventory without requiring the user to pass a full replacement for inventory. This has been broken up into three patches to help with review.

Main Themes

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting.

As mentioned above, this is currently paused while other things take priority. If you have time that you could spend on this please respond here expressing that interest.

Cleanup

Cleanup is an overarching theme related to improving documentation, performance and the maintainability of the code. The changes we are making this cycle are fairly complex to use and are fairly complex to write, so it is good that we're going to have plenty of time to clean and clarify all these things.

As said above, there's lots of performance work in progress. We'll need to make a similar effort with regard to docs. For example, all of the coders involved in the creation and review of the same_subtree functionality struggle to explain, clearly and simply, how it will work in a variety of situations. We need to enumerate the situations and the outcomes, in documentation.

One outcome of this work will be something like a Deployment Considerations document to help people choose how to tweak their placement deployment to match their needs. The simple answer is use more web servers and more database servers, but that's often very wasteful.

On the performance front, there is one major area of impact which has not received much attention yet. When requesting allocation candidates (or resource providers) that will return many results the cost of JSON serialization is just under one quarter of the processing time. This is to be expected when the response body is 2379k big, and 154000 lines long (when pretty printed) for 7000 provider summaries and 2000 allocation requests.

But there are ways to fix it. One is to ask more focused questions (so fewer results are expected). Another is to limit=N the results (but this can lead to issues with migrations).

Another is to use a different JSON serializer. Should we do that? It make a big difference with large result sets (which will be common in big and sparse clouds).

Other Placement

Miscellaneous changes can be found in the usual place.

There are two os-traits changes being discussed. And zero os-resource-classes changes.

Other Service Users

New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed.

End

Have a good next week.

by Chris Dent at August 16, 2019 02:34 PM

August 13, 2019

CERN Tech Blog

Nova support for a large Ironic deployment

CERN runs OpenStack Ironic to provision all the new hardware deliveries and the on-demand requests for baremetal instances. It replaced already most of the workflows and tools to manage the lifecycle of physical nodes, but we continue to work with the upstream community to improve the pre-production burn-in, the up-front performance validation and the integration of retirement workflows. During the last 2 years the service has grown from 0 to ~3100 physical nodes.

by CERN (techblog-contact@cern.ch) at August 13, 2019 02:00 PM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
October 13, 2019 08:24 PM
All times are UTC.

Powered by:
Planet