January 18, 2019

Chris Dent

Placement Update 19-02

Hi! It's a placement update! The main excitement this week is we had a meeting to check in on the state of extraction and figure out the areas that need the most attention. More on that in the extraction section within.

Most Important

Work to complete and review changes to deployment to support extracted placement is the main thing that matters.

What's Changed

  • Placement is now able to publish release notes.

  • Placement is running python 3.7 unit tests in the gate, but not functional (yet).

  • We had that meeting and Matt made some notes.

Bugs

Specs

Last week was spec freeze so I'll not list all the specs here, but for reference, there were 16 specs listed last week and all 16 of them are neither merged nor abandoned.

Main Themes

The reshaper work was restarted after discussion at the meeting surfaced its stalled nature. The libvirt side of things is due some refactoring while the xenapi side is waiting for a new owner to come up to speed. Gibi has proposed a related functional test. All of that at:

Also making use of nested is this spectacular stack of code at bandwidth-resource-provider:

Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

That stuff is very close to ready and will make lots of people happy when it merges. One of the main areas of concern is making sure it doesn't break things for Ironic.

Extraction

As noted above, there was a meeting which resulted in Matt's Notes, an updated extraction etherpad, and an improved understanding of where things stand.

The critical work to ensure a healthy extraction is with getting deployment tools working. Here are some of the links to that work:

We also worked out that getting the online database migrations happening on the placement side of the world would help:

Documentation is mostly in-progress, but needs some review from packagers. A change to openstack-manuals depends on the initial placement install docs.

There is a patch to delete placement from nova on which we've put an administrative -2 until it is safe to do the delete.

Other

There are 13 open changes in placement itself. Several of those are easy win cleanups.

Of those placement changes, the online-migration-related ones are the most important.

Outside of placement (I've decided to trim this list to just stuff that's seen a commit in the last two months):

End

Because I wanted to see what it might look like, I made a toy VM scheduler and placer, using etcd and placement. Then I wrote a blog post. I wish there was more time for this kind of educational and exploratory playing.

by Chris Dent at January 18, 2019 03:43 PM

Aptira

One Man’s Crush on Technology: OpenKilda

Aptira Crush on Technology: OpenKilda

For good or bad, Technologists can be pretty passionate people. I mean, how many other professionals would happily describe an inanimate object, or worse, a virtual concept like software as sexy? If you were to ask, the reasons for their love of one piece of technology or another would be as personal as well, anything you might be passionate about. 

For me, it’s the elegance and intelligence of the solution that excites me. Perhaps call it a professional acknowledgment for pragmatic and effective solutions. An appreciation for solutions that have been well thought out and provide opportunities for scale, growth and enhancement. 

It was Late in the spring of ‘17 that I first became aware of OpenKilda.  As part of an availability and performance assessment, I had spent some time thinking about what a unique Web-Scale SDN Controller should look like. How should it operate? What were the basic, functional, building blocks that were needed? That was when the slides for OpenKilda crossed my desk. 

The architecture slides is what had me enamoured; built from the ground up using mature, established components to support the challenges of transport and geographically diverse networks. Components that of themselves, were known for their intelligent design. I’d like to think that if I was going to design an SD-WAN controller, it would look like this. 

OpenKilda set it-self apart in the SDN Controller market. It wasn’t trying to be a general SDN Controller, shoe horned into WAN and Transport applications. It was a true WAN and transport SDN solution from birth. 

Still a little immature, was OpenKilda that diamond in the rough we were all looking for? To my eyes the solution was certainly elegant: Lean yet powerful. Simple, yet sophisticated.  But ultimately, there was one thing I could see that had me very excited: Opportunity. 

The value of a product or solution is not in what it does, but the value it can create for others. OpenKilda’s make-up of mature, open sourced components like Kafka, Storm and Elastic, is what presented that value.  

Access to established communities, plug & play extensions and a wider pool of available talent, meant OpenKilda was potentially more extensible than the others. Across those components, a diverse, already established ecosystem of vendors, service providers and integrators, meant there were potentially more invested interests in its success.  

What’s more, John Vestal and team (OpenKilda’s creators) were eager to share OpenKilda with the world. Hopefully building on, and building out, what they had already started.  Yes, it was fair to say I was excited.  Some birds are simply never meant to be caged. 

 …It would be nearly a year before I could broker a more intimate introduction. A short but deep exploration under the covers as we considered what lay on the road ahead. Telecommunications, Media, Finance; The opportunities are potentially wide and expansive.  Will OpenKilda be the key to unlocking them?  I think it just could be…  

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post One Man’s Crush on Technology: OpenKilda appeared first on Aptira.

by Craig Armour at January 18, 2019 04:35 AM

January 17, 2019

OpenStack Superuser

Tips for talks on the Public Cloud Track for the Denver Summit

Have an open source public cloud story? It’s time to talk about it. The call for presentations for the first Open Infrastructure Summit is open until next Wednesday, January 23.

Typically, just 25 percent of submissions are chosen for the conference. In light of that fierce competition, Superuser is talking to Programming Committee members of the Tracks for the Denver Summit to help shed light on what topics, takeaways and projects they’re hoping to see in sessions submitted to their Track.

Here we’re featuring the Public Cloud Track with tips from Tobias Rydberg, chair of the OpenStack Public Cloud Working Group. He talked to Superuser about some of the content that should be submitted to this track as well as what attendees can expect. Want more help on your submission? Rydberg offered to help over Twitter direct message or in IRC (#open-infra-summit-cfp) before next week’s deadline.

Public Cloud Track topics

Architecture and hardware, economics, cloud portability, features and needs, federation, hardware, operations and upgrades, multi-tenancy, networking, performance, scale, security and compliance, service-level agreements (SLAs), storage, open-source platforms, tools and and SDKs, UI/UX, upgrades, user experience.

What content you would like to see submitted to this Track?

We’re looking for a broad variety of presentations in the Public Cloud Track. Everything from the business perspective to technical talks. Summit attendees would love to hear more from operators who have delivered OpenStack as a public cloud, how you as a public cloud operator handle your daily business, what challenges you have and how you solve them. It’s also helpful to share what provisioning tools you’re using and how do you manage upgrades.

What will Summit attendees take away from these sessions?

Attending the Public Cloud track at the OpenInfra Summit in Denver will give attendees a better understanding of the benefits and challenges of using open source in the public cloud sector, both as an operator and as an end user. We hope that attendees will leave with more knowledge and ideas how to evolve and improve their current operations and business.

Who’s the target audience for this track?

The potential audience of this track is pretty broad – it could be developer wanting to get a better understanding of the challenges with OpenStack and Open Source in a pubic cloud environment, operators looking for ideas and solutions to their businesses as well as potential end users interested seeing the benefits of using open-source solutions.

Cover photo // CC BY NC

The post Tips for talks on the Public Cloud Track for the Denver Summit appeared first on Superuser.

by Allison Price at January 17, 2019 03:00 PM

Trinh Nguyen

Viet OpenStack first webinar 5 Nov. 2018


Yesterday, 5 November 2018, at 21:00 UTC+7, about 25 Vietnamese developers attended the very first webinar of the Vietnam OpenStack User Group [1]. This is part of a series of Upstream Contribution Training based on the OpenStack Upstream Institute [2]. The topic is "How to contribute to OpenStack". Our target is to guide new and potential developers to understand the development process of OpenStack and how they are governed.

The webinar was planned to do in Google Hang Out but with the free version, only maximum 10 people can join the video call. So, we decided to use Zoom [3]. But, because it limits to 45m per meeting for the free account, we did 2 sessions for the webinar. Thank the proactive and supports of the Vietnam OpenStack User Group administrators, the webinar went very well. Whatever works.

I uploaded the training's content on GitHub [4] and will update it based on the attendee's feedbacks. A couple feedbacks I got after the webinar are:
  • Should have exercises
  • Find a more stable webinar tool
  • The training should happen earlier
  • The topics should be simpler for new contributors to follow
You can find the recorded videos of the webinar here:

Session 1: https://youtu.be/k3U7MjBNt-k




Session 2: https://youtu.be/nIkmIgTvfd4




We continue to gather feedback from the attendees and plan for the second webinar next month.

References:

[1] https://www.meetup.com/VietOpenStack/events/hpcglqyxpbhb/
[2] https://docs.openstack.org/upstream-training/upstream-training-content.html
[3] https://zoom.us
[4] https://github.com/dangtrinhnt/vietstack-webinars

by Trinh Nguyen (noreply@blogger.com) at January 17, 2019 02:22 AM

Viet OpenStack (now renamed Viet OpenInfa) second webinar 10 Dec. 2018



Yes, we did it, the second OpenStack Upstream Contribution webinar. This time we focused on debugging tips and tricks for first-time developers. We also had time to introduce some of the great tools such as Zuul CI [1] (and how to use the Zuul status page [2] to keep track of running tasks), ARA report [3], and tox [4] etc. During the session, attendees had shared some great experience when debugging OpenStack projects (e.g., how to read logs, use ide, etc.). And,  a lot of good questions has been raised such as how to use ipdb [7] to debug running services (using ipdb to debug is quite hardcore I think :)) etc. You can check out this GitHub link [5] for chat logs and other materials.

I want to say thanks to all the people at the Jitsi open source project [6] that provides a great conferencing platform for us. We were able to have video discussion smoothly without any limitation or interruption and the experience was so great.

Watch the recorded video here: https://youtu.be/rI2zPQYtX-g




References:

[1] https://zuul-ci.org/
[2] http://zuul.openstack.org/status
[3] http://logs.openstack.org/78/570078/10/check/neutron-grenade-multinode/303521d/ara-report/reports/
[4] https://tox.readthedocs.io/en/latest/
[5] https://github.com/dangtrinhnt/vietstack-webinars/tree/master/second
[6] https://jitsi.org/
[7] https://pypi.org/project/ipdb/

by Trinh Nguyen (noreply@blogger.com) at January 17, 2019 02:22 AM

January 16, 2019

OpenStack Superuser

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure Newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

Spotlight on… Zuul

Zuul, a pilot project supported by the OpenStack Foundation (OSF), is a suite of free/libre open source software that drives continuous integration, delivery and deployment (CI/CD) with a focus on project gating and coordinating changes across interrelated projects.

Zuul tests cross-project changes in parallel so users can easily validate changes to multiple systems together before landing a single patch.

Since 2012, Zuul has been proven at scale as a critical part of the OpenStack development process. In early 2018, version 3 was released and Zuul became an independently-governed effort, distinct from the OpenStack project. The third major release also marked a significant rewrite to improve general re-usability of Zuul outside of the OpenStack project and has seen adoption in organizations like BMW, leboncoin, GoDaddy and OpenLab. Many of Zuul’s users are also contributors, with development coming from the likes of Red Hat, BMW, GoDaddy, Huawei, GitHub and the OSF.

Zuul now supports code management through connection drivers for Gerrit, GitHub and generic Git remote repositories, with work underway to add Pagure. Since Zuul relies on Ansible for job definition language, it can run builds on any operating system that Ansible can manage. Zuul’s resource pool manager, Nodepool, can manage workloads on resources dynamically provisioned through APIs for OpenStack and Kubernetes, or on separately-maintained “static” servers and is working to add Red Hat OpenShift, Amazon Elastic Compute Cloud (AWS EC2), Microsoft Azure and VMware vSphere support to that list. Major Zuul design discussions currently underway include support for using multiple concurrent Ansible versions to run jobs from its executor, and distributing the resource scheduler to eliminate it as a single point of failure.

If you’re interested in trying out Zuul for yourself, check out these resources:

OpenStack Foundation news

  • The Call for Presentations is currently open for the Open Infrastructure Summit that is being held April 29-May 1 in Denver, Colorado. Check out the updated Summit tracks and submit your session by next week’s deadline: Wednesday, January 23 at 11:59 p.m. PT.
  • The Travel Support Program for the Denver Summit and PTG is also open. Submit your application before February 27 at 11:59 p.m. PT.
  • All OpenStack Foundation members received a link to vote in the Board of Directors Individual election and bylaws amendments this week. Check your email and cast your vote by this Friday, January 18, 2019 at 11:00 a.m. CST/1700 UTC.
  • The Diversity & Inclusion Working Group is conducting an anonymous survey to better understand the diversity and makeup of the community. Participation is appreciated so we can better understand and serve the community. Share any questions with working group chair, Amy Marrich (spotz on IRC).

OpenStack Foundation project news

OpenStack

  • The development of the upcoming OpenStack release reached the Stein-2 milestone last week. We now know what deliverables to expect in the final Stein release, planned for April 10.
  • It’s been a month since the OpenStack community switched back to using a single list for discussion, forming a single community of contributors. Please read Jeremy Stanley’s report to learn more.

Airship

  • The Airship Team continues to work towards the 1.0 release and invites comments and feedback. A developer and user feedback session to help new users become more engaged with 1.0 release is in the works. Details to come on the Airship mailing list.
  • A specification for leveraging Ironic as a bare metal driver has merged. Catch up on the full discussion by watching the recording of the January 10 Airship Design Meeting. Want to get involved with Airship design and learn more? The team meets every Thursday at 11:00 a.m CT for an open design meeting.

Kata Containers

  • Over the past several weeks the Kata team has been busy working on the 1.5 release, scheduled to land January 23. It will offer support for containerd v2, Firecracker and live upgrades. The 1.5 release candidate is available now for preview.
  • The Kata community has formed a new Marketing Content special interest group that will begin monthly meetings on January 16. Details are available in the #kata-marketing channel in the Slack group.

StarlingX

  • The first StarlingX Contributor Meetup is ongoing in Chandler, Arizona to cover both technical and community-related. topics
  • The community set up a mail report of StarlingX builds from CENGN to make sure any issue is corrected immediately.

Questions / feedback / contribute

This newsletter is edited by the OpenStack Foundation staff to highlight open infrastructure communities. We want to hear from you!
If you have feedback, news or stories that you want to share, reach us through community@openstack.org and to receive the newsletter, sign up here.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by OpenStack Foundation at January 16, 2019 08:03 PM

Ben Nemec

OpenStack Virtual Baremetal Imported to OpenStack Infra

As foretold in a previous post, OVB has been imported to OpenStack Infra. The repo can now be found at https://git.openstack.org/cgit/openstack/openstack-virtual-baremetal. All future development will happen there so you should update any existing references you may have. In addition, changes will now be proposed via Gerrit instead of Github pull requests. \o/

For the moment, the core reviewer list is largely limited to the same people who had commit access to the Github repo. The TripleO PTL and one other have been added, but that will likely continue to change over time. The full list can be found here.

Because of the still-limited core list, not much about the approval process will change as a result of this import. I will continue to review and single-approve patches just like I did on Github. However, there are plans in the works to add CI gating to the repo (another benefit of the import) and once that has happened we will most likely open up the core reviewer list to a wider group.

Questions and comments via the usual channels.

by bnemec at January 16, 2019 06:18 PM

Trinh Nguyen

VietOpenInfra third webinar - 14th Jan. 2019



Yay, finally after the new year holiday we can organize the third upstream training webinar for OpenStack developers in Vietnam [1]. This time we invited Kendall Nelson [2], Upstream Developer Advocate for the OpenStack Foundation, to teach us about the Storyboard [3] and Launchpad [4] task management tools (she's also one of the core developers of the Storyboard project).

We first started with the Jitsi conferencing platform [5] but we could not communicate with Kendall (in the US) for some reason. So, we decided to switch back to Zoom [6] and everything went well after that. There were about 12 people attended the webinar and we had a good conversation with Kendall about some aspects of Storyboard which is quite new to some users. You can check out the conversation (log chat) here [7]. Below is the recorded video:



We would like to say thanks to Kendall Nelson for her kind acceptance to teach us this time even though the schedule was pretty early for her (6AM her time). We learned a lot from her presentation and even someone in the audiences would want to contribute to the Storyboard project (here are some low hanging fruit to work on [9]).

P/S: You can follow this link [8] for the previous webinars.

References:

[1] https://www.meetup.com/VietOpenStack/events/257860457
[2] https://twitter.com/knelson92
[3] https://storyboard.openstack.org/
[4] https://launchpad.net/openstack
[5] https://www.meetup.com/VietOpenStack/events/257860457
[6] https://zoom.us
[7] https://github.com/dangtrinhnt/vietstack-webinars/tree/master/third
[8] https://github.com/dangtrinhnt/vietstack-webinars
[9] https://storyboard.openstack.org/#!/board/115

by Trinh Nguyen (noreply@blogger.com) at January 16, 2019 01:53 AM

January 15, 2019

OpenStack Superuser

Going by the Ansible playbook: How eBay Classifieds survived Spectre and Meltdown

Just about a year ago, the security community got a nasty wake-up call: Spectre and Meltdown.

Considered “pretty catastrophic” by experts, they were a series of vulnerabilities discovered by various security researchers around performance optimization techniques built in modern CPUs. Those optimizations (involving superscalar capabilities, out-of-order execution, and speculative branch prediction) essentially created a side-channel that could be exploited to deduce the content of computer memory that should normally not be accessible.

For e-commerce giant eBay, keeping the nightmares away was a particularly complex project. The eBay Classifieds Group has a private cloud distributed in two geographical regions (with future plans in the works for a third), around 1,000 hypervisors and a capacity of 80,000 cores. The team needed to patch hypervisors on four availability zones for each region with the latest kernel, KVM version and BIOS updates. During these updates the zones were unavailable and all the instances restarted automatically.

Bruno Bompastor and Adrian Joian, from eBay’s cloud reliability team, shared how shoring up their system against these vulnerabilities stretched from January until July. One of the takeaways? First that Ansible is a great tool for infrastructure automation. “We decided to use Ansible as our main tool and heavily relied on Ansible roles as a way to organize tasks,” Bompastor says. As an example, the team has OpenStack roles, hardware roles, update roles and — the most important one for this project — the checker role, to scan for these vulnerabilities. They ran a checker on the host (an open-source script that basically tests the variants they wanted to check). Available on GitHub, “it’s a very nice script that covers everything like this…”

They also gave an inside look all the work the team had to perform to shut down, update and boot successfully an infrastructure fully patched and without data loss. They discussed how the team managed of SDN (Juniper Contrail) and LBaaS (Avi Networks) when restarting this massive number of cores.

Catch the whole case study on YouTube or the slides here.

The post Going by the Ansible playbook: How eBay Classifieds survived Spectre and Meltdown appeared first on Superuser.

by Superuser at January 15, 2019 05:06 PM

January 14, 2019

SUSE Conversations

Looking for a reason to attend SUSECON? I’ve got 5!

In today’s business environment, every company is a digital company. IT infrastructure needs to not only keep pace but also move fast enough to accommodate strategic business and technology initiatives such as cloud, mobile and the Internet of Things. At SUSECON 2019, see how our open, open source approach helps our customers and partners transform […]

The post Looking for a reason to attend SUSECON? I’ve got 5! appeared first on SUSE Communities.

by Kent Wimmer at January 14, 2019 08:58 PM

OpenStack Superuser

TripleO networks: From simplest to not-so-simple

TripleO (OpenStack On OpenStack) is a program aimed at installing, upgrading and operating OpenStack clouds using OpenStack’s own cloud facilities as the foundations – building on Nova, Neutron and Heat to automate fleet management at datacenter scale.

If you read the TripleO setup for network isolation, it lists eight distinct networks. Why does TripleO need so many networks? Let’s take it from the ground up.

Table of contents

WiFi to the workstation

WifI to the Workstation

I run Red Hat OpenStack Platform (OSP) Director, which is the productized version of TripleO.  Everything I say here should apply equally well to the upstream and downstream variants.

My setup has OSP Director running in a virtual machine (VM). To get that virtual machine set up requires network connectivity. I perform this via wireless, as I move around the house with my laptop. The workstation has a built-in wireless card.

Let’s start here: Director runs inside a virtual machine on the workstation.  It has complete access to the network interface card (NIC) via macvtap.  This NIC is attached to a Cisco Catalyst switch.  A wired cable to my laptop is also attached to the switch.  This allows me to set up and test the first stage of network connectivity:  SSH access to the virtual machine running in the workstation.

Provisioning network

The blue network here is the provisioning network.  This reflects two of the networks from the TripleO document:

  • IPMI* (IPMI System controller, iLO, DRAC)
  • Provisioning* (Undercloud control plane for deployment and management)

These two distinct roles can be served by the same network in my set up, and, in fact they must be.  Why?  Because my Dell servers have a NIC that acts as both the IPMI endpoint and is the only NIC that supports PXE.  Thus, unless I wanted to do some serious VLAN wizardry, and get the NIC to switch both (tough to debug during the setup stage), I’m better off with them both using untagged VLAN traffic.  This way, each server is allocated two static IPv4 address, one used for IPMI and one that will be assigned during the hardware provisioning.

Apologies for the acronym soup.  It bothers me, too.

Another way to think about the set of networks that you need is via DHCP traffic.  Since the IPMI cards are statically assigned their IP addresses, they don’t need a DHCP server.  But the hardware’s operating system will get its IP address from DHCP.  So it’s OK if these two functions share a network.

This doesn’t scale very well.  IPMI and IDrac can both support DHCP and that would be the better way to go in the future, but it’s beyond the scope of what I’m willing to mess with in my lab.

Deploying the overcloud

In order to deploy the overcloud, the director machine needs to perform two classes of network calls:

  1. SSH calls to the bare metal OS to launch the services, almost all of which are containers.  This is on the blue network above.
  2. HTTPS calls to the services running in those containers.  These services also need to be able to talk to each other.  This is on the yellow internal API network above.  (I didn’t color code “yellow” as you can’t read it. )

Internal (not) versus external

You might notice that my diagram has an additional network; the external API network is shown in red.

Provisioning and calling services are two very different use cases.  The most common API call in OpenStack is POST https://identity/v3/auth/token.  This call is made prior to any other call.  The second most common is the call to validate a token.  The create token call needs to be access able from everywhere that OpenStack is used. The validate token call does not.  But, if the API server only listens on the same network that’s used for provisioning, that means the network is wide open;  people who should only be able to access the OpenStack APIs have the capability to send network attacks against the IPMI cards.

To split this traffic, either the network APIs need to listen on both networks or the provisioning needs to happen on the external API network. Either way, both networks are going to be set up when the overcloud is deployed.

Thus, the red server represents the API servers that are running on the controller and the yellow server represents the internal agents that are running on the compute node.

Some Keystone history

When a user performs an action in the OpenStack system, they make an API call.  This request is processed by the web server running on the appropriate controller host.  There’s no difference between a Nova server requesting a token and project member requesting a token. These were seen as separate use cases and were put on separate network ports.  The internal traffic was on port 35357 and the project member traffic was on port 5000.

It turns out that running on two different ports of the same IP address doesn’t solve the problem people were trying to fix.  They wanted to limit API access via network, not by port.  Therefore, there really was no need for two different ports, but rather two different IP addresses.

This distinction still shows up in the Keystone service catalog, where endpoints are classified as external or internal.

Deploying and using a virtual machine

Now our diagram has become a little more complicated.  Let’s start with the newly added red laptop, attached to the external API network.  This system is used by our project member and is used to create the new virtual machine via the compute create_server API call.

Here’s the order of how it works:

  1. The API call comes from the outside, travels over the red external API network to the Nova server (shown in red)
  2. The Nova posts messages to the the queue, which are eventually picked up and processed by the compute agent (shown in yellow).
  3. The compute agent talks back to the other API servers (also shown in red) to fetch images, create network ports and connect to storage volumes.
  4. The new VM (shown in green) is created and connects via an internal, non-routable IP address to the metadata server to fetch configuration data.
  5. The new VM is connected to the provider network (also shown in green.)

At this point, the VM is up and running.  If an end user wants to connect to it they can do so.  Obviously, the provider network doesn’t run all the way through the router to the end user’s system, but this path is the open-for-business network pathway.

Note that this is an instance of a provider network as Assaf Muller defined in his post.

Tenant networks

Let’s say you’re not using a provider network.  How does that change the set up?  First, let’s re-label the green network as the external network.  Notice that the virtual machines don’t connect to it now.  Instead, they connect via the new purple networks.

Note that the purple networks connect to the external network in the network controller node, show in purple on the bottom server.  This service plays the role of a router, converting the internal traffic on the tenant network to the external traffic.  This is where the floating IPs terminate and are mapped to an address on the internal network.

Wrap up

The TripleO network story has evolved to support a robust configuration that splits traffic into component segments.  The diagrams above attempt to pass along my understanding of how they work and why.

I’ve left off some of the story, as I don’t show the separate networks that can be used for storage.  I’ve collapsed the controllers and agents into a simple block to avoid confusing detail: my goal is accuracy, but here it sacrifices precision.  It also only shows a simple rack configuration, much like the one here in my office.  The concepts presented should allow you to understand how it would scale up to a larger deployment.  I expect to talk about that in the future as well.

I’ll be sure to update  this article with feedback. Please let me know what I got wrong and what I can state more clearly.

About the author

Adam Young is a cloud solutions architect at Red Hat, responsible for helping people develop their cloud strategies. He has been a long time core developer on Keystone, the authentication and authorization service for OpenStack. He’s also worked on various systems management tools, including the identity management component of Red Hat Enterprise Linux based on the FreeIPA technology. A 20-year industry veteran,  Young has contributed to multiple projects, products and solutions from Java based eCommerce web sites to Kernel modifications for Beowulf clustering. This post first appeared on his blog.

Cover photo // CC BY NC

The post TripleO networks: From simplest to not-so-simple appeared first on Superuser.

by Adam Young at January 14, 2019 03:01 PM

January 13, 2019

Chris Dent

etcd + placement + virt-install → compute

I've had a few persistent complaints in my four and half years of working on OpenStack, but two that stand out are:

  • The use of RPC—with large complicated objects being passed around on a message bus—to make things happen. It's fragile, noisy, over-complicated, hard to manage, hard to debug, easy to get wrong, and leads to workarounds ("raise the timeout") that don't fix the core problem.

  • It's hard, because of the many and diverse things to do in such a large commmunity, to spend adequate time reflecting, learning how things work, and learning new stuff.

So I decided to try a little project to address both and talk about it before it is anything worth bragging about. I reasoned that if I use the placement service to manage resources and etcd to share state, I could model a scheduler talking to one or more compute nodes. Not to do something so huge as replace nova (which has so much complexity because it does many complex things), but to explore the problem space.

Most of the initial work involved getting some simple etcd clients speaking to to etcd and placement and mocking out the creation of fake VMs. After that I dropped the work because of the many and diverse things to do, leaving a note to myself to investigate using virt-install.

I took nine months to come back to, but over the course of a couple hours on two or three days I had it booting VMs on multiple compute nodes.

In my little environment a compute node starts up, learns about its environment, and creates a resource provider and associated inventories representing the virtual cpus, disk, and memory it has available. It then sits in a loop, watching an etcd key associated with itself.

Beside the compute process there's a faked out metadata server running.

A scheduler takes a resource request and asks placement for list of allocation candidates. The first candidate is selected, an allocation is made for the resources and the allocations and an image URL are put to the etcd key that the compute node is watching.

The compute sees the change on the watched key, fetches the image, resizes it to the allocated disk size, then boots it with virt-install using the allocated vcpus and memory. When the VM is up another key is set in etcd containing the IP of the created instance.

If the metadata server has been configured with an ssh public key, and the booted image looks for the metadata server, you can ssh into the booted VM using that key. For now it is only from the same host as the compute-node. Real networking is left as an exercise to the reader.

In the course of the work described in those ↑ short paragraphs is more learning about some of the fundamentals of creating a virtual machine than a few years of reading and reviewing inscrutable nova code. I should have done this much sooner.

The really messy code is in etcd-compute on GitHub.

by Chris Dent at January 13, 2019 09:00 PM

January 11, 2019

Ben Nemec

Debugging a Segfault in oslo.privsep

I recently helped track down a bug exposed by a recent oslo.privsep release that added threading to allow parallel privileged calls. It was a segfault happening in the privsep daemon that was caused by a C call in a privileged Neutron module. This, as you might expect, was a little tricky to debug so I thought I'd document the process for posterity.

There were a couple of reasons this was tough. First, it was a segfault, which meant something went wrong in the underlying C code. Python debuggers need not apply. Second, there's a bunch of forking that happens to start the privsep daemon, which meant I couldn't just run Python in gdb. Well, maybe I could have, but my gdb skills are not strong enough to navigate through a bunch of different forks.

To get gdb attached to the correct process, I followed the debugging with gdb instructions from Python, specifically the ones to attach to an existing process. To make sure I had time to get it attached, I added a sleep to the startup of the privsep daemon installed in my Neutron tox venv. Essentially I would run the test:

tox -e dsvm-functional -- neutron.tests.functional.agent.linux.test_netlink_lib.NetlinkLibTestCase.test_list_entries

Find the privsep-helper process that was eventually started, then attach gdb to it with:

gdb python [pid]

I also needed to install some debuginfo packages on my system to get useful tracebacks from the libraries involved. Gdb gave me the install command to do so, which was handy. I believe the important part here was dnf debuginfo-install libnetfilter_conntrack, but that will vary depending on what you're debugging.

Once gdb was attached, I typed c to tell it to continue (gdb interrupts the process when you attach), then once the segfault happened I used commands like bt, list, and print to examine the code and state where the crash happened. This allowed me to determine that we were passing in a bad pointer as one of the parameters for the C call. It turned out we were truncating pointers because we hadn't specified the proper parameter and return types, so large memory addresses were being squeezed into ints that were too small to hold them. Why the oslo.privsep threading change exposed this I don't know, but my guess is that it has something to do with the address space changing when the calls were made from a thread instead of the main process.

In any case, after quite a bit of cooperative debugging in the OpenStack community and a fair amount of rust removal from my gdb skills, we were able to resolve this bug and unblock the use of threaded oslo.privsep. This should allow us to significantly reduce the attack surface for OpenStack services, resulting in much better security.

I hope this was useful, and as always if you have any questions or comments don't hesitate to contact me.

by bnemec at January 11, 2019 09:24 PM

OpenStack Superuser

Buffer your open infrastructure knowledge: Upcoming training & webinars

It’s January. The time of year when the mind turns to calls for papers. Lifelong learning. And binge watching home organization shows rather than actually decluttering. (Maybe just me on that last one?)

Here are top picks for free or low cost upcoming learning opps. If you’ve recently done a webinar and want to share your takeaways about it for Superuser, remember that contributed posts earn you Active User Contributor status.

Intent-based network load balancer automation and Ansible

In this session and live demo from Red Hat, you will learn from actual customer use cases on Ansible automation modules from configuring network functions, automating load balancer clusters how to and create L4-L7 configuration.
Details here.

Storyboard and Launchpad

The Vietnam Open Infrastructure Community’s first meetup (and webinar) of the year will feature the OpenStack Foundation’s Kendall Nelson talking about StoryBoard and Launchpad. StoryBoard is a web application for task tracking across inter-related projects. Launchpad is a web application and website that allows users to develop and maintain software.
Details here.

Using NetApp Trident with cloud volumes ONTAP for provisioning Kubernetes persistent volumes

The webinar twill show you how to use provisioning persistent volumes for Kubernetes using NetApp’s Trident and Cloud Volumes ONTAP. NetApp Trident is a dynamic storage provisioner leveraging cloud volumes ONTAP for Kubernetes persistent volume claims. Trident is a fully supported open source project maintained by NetApp. Details here.

Kubernetes and Istio Security

Gadi Naor, CTO and co- founder at startup Alcide will cover basic Istio security features, managing and restricting traffic to external services with Kubernetes and Istio network policies, spotting security anomalies as as well as some interesting use case. Details here.

Modernize your data center through open source

Cumulus’ co-founder and current CTO, JR Rivers, will discuss the growth of the open source community and how Cumulus is helping bring open source principles into modern data center networks. He’ll dive into some of the company’s contributions to the open source community such as Open Network Install Environment (ONIE), ifupdown2, VRF for Linux and FRRouting. Details here.

Suse Expert Days 2019

Offered in more than 80 cities worldwide, the SUSE Expert Days tour offers a free day of technical discussions, presentations and demos. The theme? Open. Redefined. Participants will learn about how to

  • Transform IT infrastructure
  • Create a more agile business
  • Make room for innovation

Events kick off in January for Europe and in February Latin America and North America. Full list of events here.

The post Buffer your open infrastructure knowledge: Upcoming training & webinars appeared first on Superuser.

by Nicole Martinelli at January 11, 2019 05:03 PM

Chris Dent

Placement Update 19-01

Hello! Here's placement update 19-01. Not a ton to report this week, so this will mostly be updating the lists provided last week.

Most Important

As mentioned last week, there will be a meeting next week to discuss what is left before we can pull the trigger on deleting the placement code from nova. Wednesday is looking like a good day, perhaps at 1700UTC, but we'll need to confirm that on Monday when more people are around. Feel free to respond on this thread if that won't work for you (and suggest an alternative).

Since deleting the code is dependent on deployment tooling being able to handle extracted placement (and upgrades to it), reviewing that work is important (see below).

What's Changed

  • It was nova's spec freeze this week, so a lot of effort was spent getting some specs reviewed and merged. That's reflected in the shorter specs section, below.

  • Placement had a release and was published to pypi. This was a good excuse to write (yet another) blog post on how easy it is to play with.

Bugs

Specs

With spec freeze this week, this will be the last time we'll see this section until near the end of this cycle. Only one of the specs listed last week merged (placement for counting quota).

Main Themes

Making Nested Useful

I've been saying for a few weeks that "progress continues on gpu-reshaping for libvirt and xen" but it looks like the work at:

is actually stalled. Anyone have some insight on the status of that work?

Also making use of nested is bandwidth-resource-provider:

There's a review guide for those patches.

Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Extraction

Besides the meeting mentioned above, I've refactored the extraction etherpad to make a new version that has less noise in it so the required actions are a bit more clear.

The tasks remain much the same as mentioned last week: the reshaper work mentioned above and the work to get deployment tools operating with an extracted placement:

Loci's change to have an extracted placement has merged.

Kolla has a patch to include the upgrade script. It raises the question of how or if the mysql-migrate-db.sh should be distributed. Should it maybe end up in the pypi distribution?

(The rest of this section is duplicated from last week.)

Documentation tuneups:

Other

There are still 13 open changes in placement itself. Most of the time critical work is happening elsewhere (notably the deployment tool changes listed above).

Of those placement changes, the database-related ones from Tetsuro are the most important.

Outside of placement:

End

If anyone has submitted, or is planning to, a proposal for summit that is placement-related, it would be great to hear about it. I had thought about doing a resilient placement in kubernetes with cockroachdb for the edge sort of thing, but then realized my motivations were suspect and I have enough to do otherwise.

by Chris Dent at January 11, 2019 03:43 PM

January 10, 2019

Ben Nemec

OpenStack Virtual Baremetal Master is Now 2.0-dev

As promised in my previous update on OVB, the 2.0-dev branch has been merged to master. If this breaks you, switch to the stable/1.0 branch, which is the same as master was prior to the 2.0-dev merge. Note that this does not mean OVB is officially 2.0 yet. I've found a couple more deprecated things that need to be removed before we declare 2.0. That will likely happen soon though.

by bnemec at January 10, 2019 08:30 PM

OpenStack Superuser

Tips for talks on the Open Development Track for the Denver Summit

Time to harness those brainstorms: The call for presentations for the first Open Infrastructure Summit is open until January 23. OpenStack Summit veterans will notice some changes in the event beyond the name. In order to reflect the diversity of projects, use cases and software development in the ecosystem, several conference tracks have been added or renamed.

Historically, just 25 percent of submissions are chosen for the conference. In light of that fierce competition, Superuser is talking to Programming Committee members of the Tracks for the Denver Summit to help shed light on what topics, takeaways and projects they are hoping to see covered in sessions submitted to their Track.
Here we’re featuring the Open Development Track, formerly the Open Source Community Track. Thierry Carrez, VP of engineering for the OpenStack Foundation and Allison Randal, board member of the OpenStack Foundation, are tasked with leading selections for this Track. They shared these insights ahead of the submission deadline.

Open Development Track topics

The 4 Opens, the future of free and open source software, challenges of open collaboration, open development best practices and tools, open source governance models, diversity and inclusion, mentoring and outreach, community management.

Describe the content you’d like to see submitted to this Track.

Today, open source licensing is not enough, we need to define standards on how open source software should be built. Models of open development come with their benefits and challenges, and with their best practices and tools. I’d like this track to be where those open development models, those standards on how open source should be built are discussed. Beyond that, this track will cover open source governance models, the challenges of diversity and inclusion, the need for mentoring and outreach, and community management topics.

What should Summit attendees take away from sessions in this Track?

Too much of open source software development today is closed, one way or another. Its development may be done behind closed doors, or its governance may be locked down to ensure lasting control by its main sponsor. I hope that this track will expose the benefits of open collaboration and help users tell the difference between different degrees of openness. I also hope this track will explain why diversity is critical to the success of open source projects and inspire attendees to participate in mentoring and outreach.

Who’s the target audience for this track?

This Track is broadly applicable to anyone who participates in an open source project, as a designer, operator, developer,community member, or sponsor. No specialist background knowledge is required, you’ll gain value from the sessions even if you are completely new to open collaboration. Experienced community leaders will benefit from exchanging ideas and best practices across communities, while new community leaders, or anyone curious about getting involved in community leadership, will benefit from the experiences of those who have gone before them.

The post Tips for talks on the Open Development Track for the Denver Summit appeared first on Superuser.

by Allison Price at January 10, 2019 03:03 PM

Trinh Nguyen

Searchlight at Stein-2 (R-14 & R-13)


Finally, we have reached the Stein-2 milestone. Over the past three months, we have been working on clarifying the use cases of Searchlight as well as envisioning a sustainable future for Searchlight. We decided to make the project a multi-cloud application that can provide search capacity across multiple cloud orchestration platforms (e.g., AWS, Azure, K8S, etc.). The effort was made possible by the contribution of Thuy Dang (our newest core member) and Sa Pham [3]. You can check out the documents at [4].

The projects are versioned as following:
  • searchlight: 6.0.0.0b2
  • searchlight-ui: 6.0.0.0b2
python-searchlightclient wasn't released in this milestone because there were no big changes.

Here are the major changes included in this release:
  • Searchlight use cases and vision [1]
  • Fix bug [2]
Let's searching!!!

References:

[1] https://review.openstack.org/#/c/629104/
[2] https://review.openstack.org/#/c/621996/
[3] https://review.openstack.org/#/c/629471/
[4] https://docs.openstack.org/searchlight/latest/user/usecases.html

by Trinh Nguyen (noreply@blogger.com) at January 10, 2019 06:31 AM

January 09, 2019

OpenStack Superuser

How open infrastructure drives the “empowered edge”

Edge computing is at the top of tech bingo terms for yet another year. Gartner Inc. recently announced that it expects that in the next five years, “empowered edge” will be moving everything from internet of things to 5G: “Cloud computing and edge computing will evolve as complementary models with cloud services being managed as a centralized service executing, not only on centralized servers, but in distributed servers on-premises and on the edge devices themselves.”

Here’s a roundup of Superuser articles pushing those edgy boundaries.

The last mile: Where edge meets containers

Learn more about StarlingX: The edge project taking flight

How open source projects are pushing the shift to edge computing

Where the cloud native approach is taking NFV architecture for 5G

Carnegie Mellon’s clear view on 5G cloudlets

Forecasting the future of cloud computing in China

Get involved

If you’d like to know more about edge initiatives, check out the Edge Working Group resources. You can also dial-in to the Edge WG weekly calls or weekly Use cases calls or check the StarlingX sub-project team calls and find further material on the website about how to contribute or jump on IRC for OpenStack project team meetings in the area of your interest. And if you’re working with edge and want to write about it (unlocking Active User Contributor status as well!) just email editorATopenstack.org

The post How open infrastructure drives the “empowered edge” appeared first on Superuser.

by Superuser at January 09, 2019 03:01 PM

January 08, 2019

OpenStack Superuser

Where to connect with OpenStack in 2019

Every year, thousands of Stackers connect in real life to share knowledge about open infrastructure at local meetups, OpenStack Days, the Summits and hackathons.

Start filling out your 2019 calendar at the events page or find your local folks on the OpenStack Foundation (OSF) Meetup page.

Right out of the gate in January, user groups from New Delhi to Munich are putting their heads together. You’ll also find community members answering questions from a booth at the fun-packed FOSDEM in Brussels, February 2-3. (For what to expect, check out our coverage here.) There are a host of spring events – consider checking out OpenInfraDays in London on April 1. It will cover all things cloud computing, from the latest developments in bare metal hardware infrastructure through to  scaling scientific computing workloads.

The next milestone: the first-ever Open Infrastructure Summit, taking place in April 29-May 1 in Denver.  Remember to plan your trip with an eye on the Project Teams Gathering (PTG) May 2-4, 2019, also in Denver. It’s an event for anyone who self-identifies as a member in a specific given project team as well as operators who are specialists on a given project and willing to spend their time giving feedback on their use case and contributing their usage experience to the project team. Wondering what it’s like when the “zoo” of OpenStack projects get together? Check out this write-up.

Then there’s an entire summer of global get-togethers that will be focusing on open infrastucture. (The calendar, much like the technology, is constantly evolving so keep an eye on that events page as it fills up for the year.)

Once you’ve saved the date, remember that Superuser is always interested in community content – get in touch at editorATopenstack.org and send us your write-ups (and photos!) of events. Articles posted are eligible for Active User Contributor (AUC) status.

Cover Photo // CC BY NC

The post Where to connect with OpenStack in 2019 appeared first on Superuser.

by Superuser at January 08, 2019 03:09 PM

January 07, 2019

John Likes OpenStack

ceph-ansible podman with vagrant

These are just my notes on how I got vagrant with libvirt working on CentOS7 and then used ceph-ansible's fedora29 podman tests to deploy a containerized ceph nautilus preview cluster without docker. I'm doing this in hopes of hooking Ceph into the new podman TripleO deploys.

Configure Vagrant with libvirt on CentOS7

I already have a CentOS7 machine I used for tripleo quickstart. I did the following to get vagrant working on it with libvirt.

1. Create a vagrant user


sudo useradd vagrant
sudo usermod -aG wheel vagrant
sudo usermod --append --groups libvirt vagrant
sudo su - vagrant
mkdir .ssh
chmod 700 .ssh/
cd .ssh/
curl https://github.com/fultonj.keys > authorized_keys
chmod 600 authorized_keys

Continue as the vagrant user.

2. Install the Vagrant and other RPMs

Download the CentOS Vagrant RPM from https://www.vagrantup.com/downloads.html and install other RPMs needed for it to work with libvirt.

sudo yum install vagrant_2.2.2_x86_64.rpm
sudo yum install qemu libvirt libvirt-devel ruby-devel gcc qemu-kvm
vagrant plugin install vagrant-libvirt

Note that I already had many of the libvirt deps above from quickstart.

3. Get a CentOS7 box for verification

Download the centos/7 box.

[vagrant@hamfast ~]$ vagrant box add centos/7
==> box: Loading metadata for box 'centos/7'
box: URL: https://vagrantcloud.com/centos/7
This box can work with multiple providers! The providers that it
can work with are listed below. Please review the list and choose
the provider you will be working with.

1) hyperv
2) libvirt
3) virtualbox
4) vmware_desktop

Enter your choice: 2
==> box: Adding box 'centos/7' (v1811.02) for provider: libvirt
box: Downloading: https://vagrantcloud.com/centos/boxes/7/versions/1811.02/providers/libvirt.box
box: Download redirected to host: cloud.centos.org
==> box: Successfully added box 'centos/7' (v1811.02) for 'libvirt'!
[vagrant@hamfast ~]$
Create a Vagrant file for it with `vagrant init centos/7`.

4. Configure Vagrant to use a custom storage pool (Optional)

Because I was already using libvirt directly with an images pool, vagrant was unable to download the centos/7 system. I like this as I want to keep my images pool separate for when I use libvirt directly. To make Vagrant happy I created my own pool for it and added the following to my Vagrantfile:


Vagrant.configure("2") do |config|
config.vm.provider :libvirt do |libvirt|
libvirt.storage_pool_name = "vagrant_images"
end
end

After doing the above `vagrant up` worked for me.

Run ceph-ansible's Fedora 29 podman tests

1. Clone ceph-ansible master

git clone git@github.com:ceph/ceph-ansible.git; cd ceph-ansible

2. Satisfy dependencies


sudo pip install -r requirements.txt
sudo pip install tox
cp vagrant_variables.yml.sample vagrant_variables.yml
cp site.yml.sample site.yml

Optionally: modify Vagrantfile for vagrant_images storage pool

3. Deploy with the container_podman


tox -e dev-container_podman -- --provider=libvirt

The above will result in tox triggering vagrant to create 10 virtual machines and then ceph-ansible will install ceph on them.

4. Inspect Deployment

Verify the virtual machines are running:


[vagrant@hamfast ~]$ cd ~/ceph-ansible/tests/functional/fedora/29/container-podman
[vagrant@hamfast container-podman]$ cp vagrant_variables.yml.sample vagrant_variables.yml
[vagrant@hamfast container-podman]$ vagrant status
Current machine states:

mgr0 running (libvirt)
client0 running (libvirt)
client1 running (libvirt)
rgw0 running (libvirt)
mds0 running (libvirt)
rbd-mirror0 running (libvirt)
iscsi-gw0 running (libvirt)
mon0 running (libvirt)
mon1 running (libvirt)
mon2 running (libvirt)
osd0 running (libvirt)
osd1 running (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
[vagrant@hamfast container-podman]$

Connect to a monitor and see that it's running Ceph containers


[vagrant@hamfast container-podman]$ vagrant ssh mon0
Last login: Mon Jan 7 17:11:28 2019 from 192.168.121.1
[vagrant@mon0 ~]$

[vagrant@mon0 ~]$ sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c494695eb0c2 docker.io/ceph/daemon:latest-master /opt/ceph-container... 4 hours ago Up 4 hours ago ceph-mgr-mon0
dbabf02df984 docker.io/ceph/daemon:latest-master /opt/ceph-container... 4 hours ago Up 4 hours ago ceph-mon-mon0
[vagrant@mon0 ~]$

[vagrant@mon0 ~]$ sudo podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/ceph/daemon latest-master 24fdc8c3cb3f 4 weeks ago 726MB
[vagrant@mon0 ~]$

[vagrant@mon0 ~]$ which docker
/usr/bin/which: no docker in (/home/vagrant/.local/bin:/home/vagrant/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
[vagrant@mon0 ~]$
Observe the status of the Ceph cluster:

[vagrant@mon0 ~]$ sudo podman exec dbabf02df984 ceph -s
cluster:
id: 9d2599f2-aec7-4c7c-a88e-7a8d39ebb557
health: HEALTH_WARN
application not enabled on 1 pool(s)

services:
mon: 3 daemons, quorum mon0,mon1,mon2 (age 71m)
mgr: mon1(active, since 70m), standbys: mon2, mon0
mds: cephfs-1/1/1 up {0=mds0=up:active}
osd: 4 osds: 4 up (since 68m), 4 in (since 68m)
rbd-mirror: 1 daemon active
rgw: 1 daemon active

data:
pools: 13 pools, 124 pgs
objects: 194 objects, 3.5 KiB
usage: 54 GiB used, 71 GiB / 125 GiB avail
pgs: 124 active+clean

[vagrant@mon0 ~]$

Observe the installed versions:


[vagrant@mon0 ~]$ sudo podman exec -ti dbabf02df984 /bin/bash
[root@mon0 /]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[root@mon0 /]#

[root@mon0 /]# ceph --version
ceph version 14.0.1-1496-gaf96e16 (af96e16271b620ab87570b1190585fffc06daeac) nautilus (dev)
[root@mon0 /]#

Observe the OSDs


[vagrant@hamfast container-podman]$ vagrant ssh osd0
[vagrant@osd0 ~]$ sudo su -
[root@osd0 ~]# podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4fe23502592c docker.io/ceph/daemon:latest-master /opt/ceph-container... About an hour ago Up About an hour ago ceph-osd-2
f582b4311076 docker.io/ceph/daemon:latest-master /opt/ceph-container... About an hour ago Up About an hour ago ceph-osd-0
[root@osd0 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 50G 0 disk
sdb 8:16 0 50G 0 disk
├─test_group-data--lv1 253:1 0 25G 0 lvm
└─test_group-data--lv2 253:2 0 12.5G 0 lvm
sdc 8:32 0 50G 0 disk
├─sdc1 8:33 0 25G 0 part
└─sdc2 8:34 0 25G 0 part
└─journals-journal1 253:3 0 25G 0 lvm
vda 252:0 0 41G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 40G 0 part
└─atomicos-root 253:0 0 40G 0 lvm /sysroot
[root@osd0 ~]#

[root@osd0 ~]# podman exec 4fe23502592c cat var/lib/ceph/osd/ceph-2/type
bluestore
[root@osd0 ~]#

by John (noreply@blogger.com) at January 07, 2019 08:07 PM

Chris Dent

Placement From PyPI

Today the OpenStack placement service has reached something of a milestone: It is now possible to install it from PyPI and make it do its thing. Here's a quick shell script that demonstrates (without keystone). It requires a working python3 development environment but is otherwise self-contained.

Because keystone is not being used, requests need to fake authentication by adding a header:

curl -H 'x-auth-token: admin' http://localhost:8000/resource_providers

To experience the full API you'll want to be using the latest microversion. You have to opt-in to that, also with a header:

curl -H 'x-auth-token: admin'\
     -H 'openstack-api-version: placement latest' \
     http://localhost:8000/resource_providers

In that gist, note the lack of a configuration file and the general sense of "is that all?" That's on purpose. Placement is a web app that sits on top of a database. It shouldn't be any more complicated to run (or scale) than any other run-of-the-mill Python-based WSGI application, because that's all it is.

Trimming Placement to this level was basically a process of finding things that were superfluous, and either removing or making them not required. A lot of the discovery for that was done in my placement container playground series. Most of it was related to configuration management.

As noted in the gist above, there's at least one more area to trim: removing the db sync. Having this as a separate step is useful for many deployment scenarios but for others it's yet another thing to remember. Having lots of thing to remember is one of things that has always bothered me about OpenStack services.

I hope that as placement evolves we can keep it simple with as few moving parts and things to remember as possible. One way to help that is to make sure it never becomes more than a web app over a persistence layer.

by Chris Dent at January 07, 2019 04:50 PM

OpenStack Superuser

Spotify’s keen ear for open source

SEATTLE — Whether your playlist runs more to Knuckle Puck than Bing Crosby, if you’re streaming on Spotify it’s in part thanks to open source.

Saunak Chakrabarti, the director of engineering, infrastructure and operations at Spotify, shared details of current and future open source efforts at the press and analyst briefing at the recent KubeCon + CloudNativeCon North America.

With some 191 million users, including 87 million subscribers across 78 markets, “building Spotify would not have been possible without free and open source software,” declares the company website. You can check out the company’s GitHub page to use their code or contribute.

The music streaming platform, founded in 2008, was an early adopter of micro-service technology and has been running these services at scale for quite a few years, Chakrabarti says. Starting early has given them an interesting perspective on the evolution of cloud native.

“The last few years as the cloud native community has grown and matured, we’ve moved over from the home-grown solutions that we had implemented because there were no great alternatives in the open source community or the vendor space,” he says. They started with Helios, a Docker orchestration platform for deploying and managing containers across an entire fleet of servers, that launched around the same time as Kubernetes. The majority of their data scheduling jobs now run on Kubernetes as well as some tier one services. Without getting into the specific numbers, Chakrabarti says that the open-source container-orchestration system is employed across a variety of contexts at Spotify.

As they move towards adopting solutions like Kubernetes and Istio, it’s a “really exciting time for us because we can take these micro-services that we’ve had running in production for a long time that align with some great industry supported tools and solutions.” For now, they have a small fraction of production services running on Kubernetes and a handful of production services on Istio with GRPC endpoints.  While he says they’ve  done a few experiments, it’s “still early stages on GRPC and Istio,  but we’re farther along with Kubernetes adoption and migration.”

In terms of whether Spotify has strategy in place for using open source, Chakrabarti says teams have a lot of autonomy to experiment with any tech that they wish.

“But when we’re talking about production services, we want to limit fragmentation. From that perspective in the last couple of years we’ve developed a strategy that’s very open-source focused,” he says. As an organization and a tech platform, they’re thinking more about how to contribute to and how to adopt more open source projects rather than building what they need in-house.

The Linux Foundation provided travel and accommodation to KubeCon.

Cover Photo // CC BY NC

The post Spotify’s keen ear for open source appeared first on Superuser.

by Nicole Martinelli at January 07, 2019 03:07 PM

January 04, 2019

John Likes OpenStack

Simulate edge deployments using TripleO Standalone

My colleagues presented at OpenStack Summit Berlin on Distributed Hyperconvergence. This includes using TripleO to deploy a central controller node, extracting information from that central node, and then passing that information as input to a second TripleO deployment at a remote location ("on the edge of the network"). This edge deployment could host its own Ceph cluster which is collocated with compute nodes in its own availability zone. A third TripleO deployment could be added for a second remote edge deployment and users could then use the central deployment to schedule workloads per availability zone closer to where the workloads are needed.

You can simulate this type of deployment today with a single hypervisor and TripleO's standalone installer as per the newly merged upstream docs.

by John (noreply@blogger.com) at January 04, 2019 07:38 PM

Chris Dent

Placement Update 19-00

Welcome to the first placement update of 2019. May all your placements have sufficient resources this year.

Most Important

A few different people have mentioned that we're approaching crunch time on pulling the trigger on deleting the placement code from nova. The week of the 14th there will be a meeting to iron out the details of what needs to be done prior to that. If this is important to you, watch out for an announcement of when it will be. This is a separate issue from putting placement under its own governance, but some of the requirements declared for that, notably a deployment tool demonstrating an upgrade from placement-in-nova to placement-alone, are relevant.

Therefore, reviewing and tracking the deployment tool related work remains critical. Those are listed below.

Also, it is spec freeze next week. There are quite a lot of specs that are relevant to placement and scheduling that are close, but not quite. Mel has sent out an email about which specs most need attention.

What's Changed

  • There's an os-resource-classes now, already merged in placement, with a change posted for nova. It's effectively the same as os-traits, but for resource classes.

  • There's a release of a 0.1.0 of placement pending. This won't have complete documentation, but will mean that there's an actually usable openstack-placement on PyPI, with what we expect to be the final python module requirements.

  • This has been true for a while, but it seems worth mentioning, via coreycb: "you can install placement-api on bionic with the stein cloud archive enabled".

  • A db stamp command has been added to placement-manage tool which makes it possible for someone who has migrated their data from nova to say "I'm at version X".

  • placement functional tests have been removed from nova.

  • Matt did a mess of work to make initializing the scheduler report client in nova less expensive and redundant.

  • Improving the handling of allocation ratios has merged, allowing for "initial allocation ratios".

Bugs

Specs

Spec freeze next week! Only one of the previously listed specs has merged since early December and a new one has been added (at the end).

Main Themes

Making Nested Useful

Progress continues on gpu-reshaping for libvirt and xen:

Also making use of nested is bandwidth-resource-provider:

There's a review guide for those patches.

Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Extraction

The extraction etherpad is starting to contain more strikethrough text than not. Progress is being made. I'll refactor that soon so it is more readable, before the week of the 14th meeting.

The main tasks are the reshaper work mentioned above and the work to get deployment tools operating with an extracted placement:

Loci's change to have an extracted placement has merged.

Kolla has a patch to include the upgrade script. It raises the question of how or if the mysql-migrate-db.sh should be distributed. Should it maybe end up in the pypi distribution?

Documentation tuneups:

Other

There are currently 13 open changes in placement itself. Most of the time critical work is happening elsewhere (notably the deployment tool changes listed above).

Of those placement changes, the database-related ones from Tetsuro are the most important.

Outside of placement:

End

Lot's of good work in progress. Our main task is making sure it all gets review and merged. The sooner we do, the sooner people get to use it and find all the bugs we're sure to have left lying around.

by Chris Dent at January 04, 2019 03:24 PM

Trinh Nguyen

VietOpenInfra Meetup #20 29th Dec. 2018


Last Saturday, in an effort to advocate for the open infrastructure initiative, I went back to Ho Chi Minh City, Viet Nam, to organize the 20th meetup of the VietOpenInfra group. Usually, the event took place in Ha Noi where most of the Vietnamese OpenStackers are. But this time, we would want to expand the community to the South because we know there are a lot of open infrastructure enthusiasts here and this could be a great chance for us to strengthen the community across the country.



There were about 15 people sitting in a nice conference room of a coffee shop sharing what they are working on. We had one member of the VietOpenInfra board traveled here to discuss what the group achieved last year (2018). I also had a talk about my plan for OpenStack Searchlight and what it means to build a universal search interface for the cloud. The disaster recovery and k8s topics also got huge attention from the audience when the speakers sharing some interesting real-life experience.

Even though there were only a few people attended the meetup, we got positive feedbacks and comments from the community. This is our last event for 2018 and prepared us for new challenges and opportunities of 2019. Hopefully, we can keep the energy and introduce the open infrastructure to many more corners of Vietnam in the year to come and would make the world a better place.

You can find the video here [1] and the slides here [2].

Reference:

[1] https://www.facebook.com/dangtrinhnt/videos/598772787209628/
[2] https://www.slideshare.net/vietstack

by Trinh Nguyen (noreply@blogger.com) at January 04, 2019 12:29 PM

January 03, 2019

Ben Nemec

OpenStack Virtual Baremetal Import Plans

There is some work underway to import the OVB repo from Github into the OpenStack Gerrit instance. This will allow us to more easily set up gate jobs so proposed changes can be tested automatically instead of the current "system" which involves me pulling down changes and running the test script against them. It will have some implications for users of OVB in the near future, so this is a summary of the plans and actions that need to be taken.

The first thing to be aware of that has already happened is the stable/1.0 branch has been created. Existing users who are not ready to start consuming 2.0 should start using the stable/1.0 branch as soon as possible. It is currently identical to the master branch so it should be as simple as checking out the new branch instead of master.

The reason the stable/1.0 branch is important is that prior to importing to Gerrit we need to merge the 2.0-dev branch to master. The reason is that feature branches aren't common in OpenStack and they create more work for the Gerrit admins since regular users can't manage branches. To avoid that, we're planning to get all of the branches in shape before importing them. In the interest of better matching the standard OpenStack branch layout we will just be importing stable/1.0 and master (which will be 2.0 by that time). I don't have a definite timeframe on when this will all happen, but expect it to be sooner rather than later.

Finally, once the import has happened (check the status on the project-config review), all future development will be happening on git.openstack.org instead of github.com, so users of OVB will need to update their repo location. That's a less immediate concern as the prior steps need to happen first, and the Github repo won't be immediately disappearing. However, it also won't be receiving updates so you can expect it to bitrot over time.

I think that covers everything. As always, if you have any questions or comments feel free to contact me.

by bnemec at January 03, 2019 06:02 PM

Adam Young

TripleO Networks from Simplest to Not-So-Simple

If you read the TripleO setup for network isolation, it lists eight distinct networks. Why does TripleO need so many networks? Lets take it from the ground up.

WiFi to the Workstation

WifI to the Workstation

I run Red Hat OpenStack Platform (OSP) Director, which is the productized version of TripleO.  Everything I say here should apply equally well to the upstream and downstream variants.

My setup has OSP Director running in a virtual machine (VM). To get that virtual machine set up takes network connectivity. I perform this via Wireless, as I move around the hose with my laptop, and the workstation has a built in wireless card.

Let’s start here: Director runs inside a virtual machine on the workstation.  It has complete access to the network interface card (NIC) via macvtap.  This NIC is attached to a Cisco Catalyst switch.  A wired cable to my laptop is also attached to the switch. This allows me to setup and test the first stage of network connectivity:  SSH access to the virtual machine running in the workstation.

Provisioning Network

The Blue Network here is the provisioning network.  This reflects two of the networks from the Tripleo document:

  • IPMI* (IPMI System controller, iLO, DRAC)
  • Provisioning* (Undercloud control plane for deployment and management)

These two distinct roles can be served by the same network in my setup, and, infact they must be.  Why?  Because my Dell servers have a  NIC that acts as both the IPMI endpoint and is the only NIC that supports PXE.  Thus, unless I wanted to do some serious VLAN wizardry, and get the NIC to switch both (tough to debug during the setup stage) I am better off with them both using untagged VLAN traffic.  Thus, each server is allocated two static IPv4 address, one to be used for IPMI, and one that will be assigned during the hardware provisioning.

Apologies for the acronym soup.  It bothers me, too.

Another way to think about the set of networks you need is via DHCP traffic.  Since the IPMI cards are statically assigned their IP addresses, they do not need a DHCP server.  But, the hardware’s Operating system will get its IP address from DHCP.  Thus, it is OK if these two functions share a Network.

This does not scale very well.  IPMI and IDrac can both support DHCP, and that would be the better way to go in the future, but is beyond the scope of what I am willing to mess with in my lab.

Deploying the Overcloud

In order to deploy the overcloud, the Director machine needs to perform two classes of network calls:

  1. SSH calls to the baremetal OS to lunch the services, almost all of which are containers.  This is on the Blue network above.
  2. HTTPS calls to the services running in those containers.  These services also need to be able to talk to each other.  This is on the Yellow internal API network above.  I didn’t color code “Yellow” as you can’t read it.  Yellow.

Internal (not) versus External

You might notice that my diagram has an additional network; the External API network is shown in Red.

Provisioning and calling services are two very different use cases.  The most common API call in OpenStack is POST https://identity/v3/auth/token.  This call is made prior to any other call.  The second most common is the call to validate a token.  The create token  call needs to be access able from everywhere that OpenStack is used.  The validate token call does not.  But, if the API server only  listens on the same network that is used for provisioning, that means the network is wide open;  people that should only be able to access the OpenStack APIs can now send network attacks against the IPMI cards.

To split this traffic, either the network APIs need to listen on both networks, or the provisioning needs to happen on the external API network. Either way, both networks are going to be set up when the overcloud is deployed.

Thus, the Red Server represents the API servers that are running on the controller, and the yellow server represents the internal agents that are running on the compute node.

Some Keystone History

When a user performs an action in the OpenStack system, they make an API call.  This request is processed by the webserver running on the appropriate controller host.  There is no difference between a Nova server requesting a token and project member requesting a token. These were seen as separate use cases, and were put on separate network ports.  The internal traffic was on port 35357, and the project member traffic was on port 5000.

It turns out that running on two different ports of the same IP address does not solve the real problem people were trying to solve.  They wanted to limit API access via network, not by port.  Thus, there really was no need for two different ports, but rather two different IP addresses.

This distinction still shows up in the Keystone service catalog, where Endpoints are classified as External or Internal.

Deploying and Using a Virtual Machine

Now Our Diagram has gotten a little more complicated.  Lets start with the newly added Red Lap top, attached to the External API network.  This system is used by our project member, and is used to create the new virtual machine via the compute create_server API call. In order:

  1. The API call comes from the outside world, travels over the Red external API network to the Nova server (shown in red)
  2. The Nova posts messages to the the Queue, which are eventually picked up and processed by the compute agent (shown in yellow).
  3. The compute agent talks back to the other API servers (also shown in Red) to fetch images, create network ports, and connect to storage volumes.
  4. The new VM (shown in green) is created and connects via an internal, non-routable IP address to the metadata server to fetch configuration data.
  5. The new VM is connected to the provider network (also shown in green).

At this point, the VM is up and running.  If an end user wants to connect to it they can do so.  Obviously, the Provider network does not run all the way through the router to the end users system, but this path is the “open for business” network pathway.

Note that this is an instance of a provider network as Assaf defined in his post.

Tenant Networks

Let say you are not using a provider network.  How does that change the setup?  First, lets re-label the Green network to be the “External Network.”  Notice that the virtual machines do not connect to it now.  Instead, they connect via the new, purple networks.

Note that the Purple networks connect to the external network in the network controller node, show in purple on the bottom server.  This service plays the role of a router, converting the internal traffic on the tenant network to the external traffic.  This is where the Floating IPs terminate, and are mapped to an address on the internal network.

Wrap Up

The TripleO network story has evolved to support a robust configuration that splits traffic into its component segments.  The diagrams above attempt to pass along my understanding of how they work, and why.

I’ve left off some of the story, as I do not show the separate networks that can be used for storage.  I’ve collapsed the controllers and agents into a simple block to avoid confusing detail, my goal is accuracy, but here it sacrifices precision.  It also only shows a simple rack configuration, much like the one here in my office.  The concepts presented should allow you to understand how it would scale up to a larger deployment.  I expect to talk about that in the future as well.

I’ll be sure to update  this article with feedback. Please let me know what I got wrong, and what I can state more clearly.

by Adam Young at January 03, 2019 05:27 PM

Sean McGinnis

Running for the OpenStack Board of Directors (Take 2)

Just a year ago I had put in my name for the OpenStack Board of Directors Individual Director election. The board is divided into 8 appointed directors from Platinum member companies, 8 directors elected from within the Gold member companies, and 8 individual directors from the community.

I think last year’s election was close, or at least I fared better than I had feared I would, so I am giving it another shot this year.

I was really on the fence about it in the months leading up to the nomination period. This post is a chance for me to jot down some of my thoughts that led me to decide it was the right thing to do.

The Role of Individual Members

The individual directors are members of the Foundation and are there to represent the members of the Foundation. Foundation membership is open to all, so I see this role as being a voice of the community within the Board governance - something that I personally find really valuable and necessary for something like a non-profit, open source community Board of Directors.

As part of my involvement in the OpenStack Technical Committee over the last couple of years, one of the things I have tried to encourage is more frequent and open communication between the Board and the TC. Only through good communication between these governance groups can we really be most effective and learn from each other to best support the goals of the Foundation and the needs of the community that has grown around OpenStack, open infrastructure, and those that care about the health and availability of open options.

I feel the need to preface the next part with saying there are some really great folks on the Board of Directors now that I truly believe are doing what they think is best to support the Foundation’s mission. These folks are taking time out of their busy schedules to be a part of this and give their knowledge and expertise to help guide the Foundation. Nothing I say here should be taken as a criticism for any of the decisions that have been made or as being directed by anyone that is now or has been a part of making OpenStack what it is today.

Now in that context, here’s where I see the need for folks like me to get involved.

For some board directors, the entirety of their participation in the OpenStack community is calling in to board meetings once a quarter. Some I’m sure are actively involved in their company’s interaction with their customers and the real world issues of trying to take the output of all of this activity and making it into something that can be packaged and supported and used to address their customer’s needs. Which is HUGELY important.

But to really understand how to shape things and make things better, I think it’s also very important to understand the development community and the challenges, needs, and pressures happening there to make the right long term direction choices.

That’s the role the individual directors fill, and these eight people need to be able to bridge that gap between knowing the right things for the companies involved, the operators trying to run the software, and the users that ultimately are trying to do something bigger and more important than whatever is used to enable their infrastructure to support their activities.

My Goals

Unlike company Board of Directors, open source foundation board effectiveness is limited to their influence within their own companies and making the case to others individuals and companies involved. Some of this is through financial direction of the foundation, and some of it is by being able to bring in their expertise and convince others whether certain things are the right thing to focus on for the long term success of the community.

Over the last several years I have worked in various areas of the community. I’ve been a PTL of a major component of most OpenStack clouds. I’m a core member of a few projects I feel are important to the overal success and health of the community. I’ve participated on the Technical Committee since spring of 2017. I’ve also spent time pushing a broom and taking care of some areas that no one notices until they start to smell.

I think whoever gets elected, they need to have that experience and be involved in the development of the code to be able to provide input on what is happening and how the board decisions could impact them.

I’ve also been a big supporter of the Ops Meetups, being involved in helping organize those events and participating by presenting, moderating, or helping to organize to be a productive forum for operators to share best practice and war stories and build this also important part of the community that we need to be a thriving group. As someone that has moved from more operations to more development, I have learned a tremendous amount from these folks and am in awe at the things they’ve been able to do and their dedication to OpenStack and willingness to share and learn and grow from each other.

I’ve also had the opportunity lately to get out and speak to customers. Some have been great OpenStack supporters. But perhaps the more important ones for me has been speaking to the ones that are not. It’s always hard to hear why something you’ve worked on for years isn’t good enough, or easy enough, or featureful enough to meet someone’s needs. But it’s also tremendously important to hear those things.

So my plan for being on the board is to be able to take my experiences as a developer within the community, a supporter of operators and users, and someone willing to process the criticisms of the software and feed those into the efforts and decisions made by the Board. Regardless of who is elected, these are all critical things that need a voice in those discussions.

Board Election Candidates

Like last time, the good news is there are a lot of really good candidates who have been willing to step up to this role.

Some definitely have more time spent in the community. Some likely have more experience operating an OpenStack cloud or running open infrastructure. Many probably have resumes that would put mine to shame. But I do hope I get the opportunity to be one of the eight individual directors. I do think I have a unique and important enough experience to provide an important voice within the OpenStack Foundation Board of Directors.

Please watch for ballots sent out when the election opens up January 14th. Read the candidate profiles and vote for whichever ones you think will be the right individual voices on the Board. The important thing is to participate and provide your vote to make sure we have a strong and health Board.

by Sean McGinnis at January 03, 2019 12:00 AM

January 02, 2019

Trinh Nguyen

Searchlight weekly report - Stein R-16 & R-15



I'd been focusing on the community work for the last two weeks. On 29th December, I went to Ho Chi Minh City, Vietnam to organize the meetup with the VietOpenInfra group [1]. The event went great and I had a chance to discuss with people there about the future of Searchlight [2]. A roadmap for Searchlight was also drafted when I was at the meetup that said making Searchlight a universal search service for the cloud. My initial idea is to make Searchlight work with K8S, AWS, and Azure.



With the new plan, I can also start writing the use cases for Searchlight and finish the Stein-2 milestone [3] the following week.

Reference:

by Trinh Nguyen (noreply@blogger.com) at January 02, 2019 01:48 PM

January 01, 2019

Giulio Fidente

Pushing Openstack and Ceph to the Edge at OpenStack summit Berlin 2018

We presented at the latest OpenStack (Open Infrastructure ?) summit, held in Berlin after te Rocky release, together with Sebastien Han and Sean Cohen, a session discussing how TripleO will support "Edge" deployments with storage at the edge; colocating Ceph with the OpenStack services in the edge zone yet keeping a small hardware footprint.

Thanks to the OpenStack Foundation for giving us the chance.

by Giulio Fidente at January 01, 2019 11:00 PM

December 27, 2018

OpenStack Superuser

What you’re doing with open infrastructure: Top user stories

Superuser highlights stories about the mix of open technologies building the modern infrastructure stack including OpenStack, Kubernetes, Ceph, Cloud Foundry, OVS, OpenContrail, Open Switch, OPNFV and more. Here are some of our favorite user stories from 2018 — from a micro-installation at a California seminary to CERN’s massive open data portal.

We want to tell your story too! Get in touch at editorATopenstack.org

Zuul case study: Tungsten Fabric

Inside CERN’s Open Data portal

OpenStack upgrades on a massive scale: Inside Yahoo

Operator spotlight: St. Photios Orthodox Theological Seminary

From cryptocurrency mining to public GPU cloud: a transformation story

How one of Sweden’s largest online lenders optimizes for speed

Making cities smarter with internet of things

Looking forward to hearing your story this year: email editorATopenstack.org

The post What you’re doing with open infrastructure: Top user stories appeared first on Superuser.

by Superuser at December 27, 2018 05:06 PM

December 24, 2018

OpenStack Superuser

Kata Containers sparks joy with holiday release offering Firecracker support and more

Never deploy on a Friday. Forget about launching during the holidays. The Kata Containers team defied common wisdom (and maybe lost a few hours of egg nog-soaked bliss) by releasing 1.5.0-rc2 on December 22.

It may have been the winter solstice, but the release is definitely a bright spot: this version includes support for Amazon Web Services’ recently open-sourced Firecracker hypervisor, s390x architecture as well as a number of fixes for shimv2 support.

Spark it up

“While we do not yet have packages available for Firecracker, we do have the built binary included as part of our release tarball. A Firecracker specific tarball was created which includes all of the configurations and binaries required for running Kata+Firecracker,” writes Eric Ernst, software engineer at Intel’s Open Source Technology Center, in the documentation.

Firecracker has been tested with CRIO+Kubernetes as well as Docker, with support for multiple network interfaces as well as block-based volumes. A block-based storage driver, such as devicemapper, is required when using Kata with Firecracker. (Read this issue for details on current volume limitations.)

Check out the quick guide to Kata and Firecracker here.

Looking ahead

The Kata team isn’t taking it easy, even now. They’re already working on the next improvements including an update to kata-deploy‘s container image that will allow users a quick daemonset for installing and configuring Kata (with both QEMU and Firecracker) in a Kubernetes cluster that utilizes containerd and/or CRIO. Once the team has accomplished that, they’ll be adding admission controller support to help navigate the spectrum of runtime’s configured with runtimeClass. Stay tuned!

Get involved

Kata Containers is a fully open-source project––check out Kata Containers on GitHub and join the channels below to find out how you can contribute.

The post Kata Containers sparks joy with holiday release offering Firecracker support and more appeared first on Superuser.

by Nicole Martinelli at December 24, 2018 06:14 PM

December 21, 2018

OpenStack Superuser

A primer on service-to-service communication

Paul Osman, a passionate advocate of open technology platforms and tools who has been building external and internal platforms for over 10 years, offers this primer.

In a real-world micro-service architecture, services frequently need to invoke other services in order to fulfill a user’s request. A typical user request will commonly create dozens of requests to services in your system.

In large-scale systems, problems arise less often in services themselves and more often in the communication between services. For this reason, you need to carefully consider various challenges in service-to-service communication.

When discussing service-to-service communication, it’s useful to visualize the flow of information in your system. Data flows in both directions–from the client (upstream) to the database, or event bus (downstream) in the form of requests, and back again in the form of responses.

When you refer to upstream services, you describe components of the system that are closer to the user in the flow of information. When you refer to downstream services, you describe components of the system that are further away from the user. In other words, the user makes a request that is routed to a service that then makes requests to other, downstream services, as shown in the following diagram:

In the preceding diagram, the originating user is upstream from the edge-proxy-service, which is upstream from the auth-service, attachment-service, and user-service.

In order to demonstrate the service-to-service communication, you’ll create a simple service that calls another service synchronously using the Spring Boot Java framework. You’ll create a message service that is responsible for sending messages. The message service has to invoke the social graph service in order to determine whether the sender and recipient of a message are friends before allowing a message to be sent. The following simplified diagram illustrates the relationship between services:

As you can see, a POST request comes in from the user to the /message endpoint, which is routed to message-service. The message-service service then makes an HTTP GET request to the social-service service using the /friendships/:idendpoint. The social-service service returns a JSON representation of friendships for a user.

  1. Create a new Java/Gradle project called message-service and add the following content to the build.gradle file:
group 'com.packtpub.microservices'
version '1.0-SNAPSHOT'

buildscript {
repositories {
mavenCentral()
}
dependencies {
classpath group: 'org.springframework.boot', name: 'spring-boot-gradle-plugin', version: '1.5.9.RELEASE'
}
}

apply plugin: 'java'
apply plugin: 'org.springframework.boot'

sourceCompatibility = 1.8

repositories {
mavenCentral()
}

dependencies {
compile group: 'org.springframework.boot', name: 'spring-boot-starter-web'
testCompile group: 'junit', name: 'junit', version: '4.12'
}
  1. Create a new package called packtpub.microservices.ch03.message and a new class called Application. This will be your service’s entry point:
package com.packtpub.microservices.ch03.message;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}
  1. Create the model. Create a package called packtpub.microservices.ch03.message.models and a class called Message. This is the internal representation of the message. There’s a lot missing here. You’re not actually persisting the message in this code, as it’s best to keep this example simple:
package com.packtpub.microservices.ch03.message.models;

public class Message {

    private String toUser;
    private String fromUser;
    private String body;

    public Message() {}

    public Message(String toUser, String fromUser, String body) {
        this.toUser = toUser;
        this.fromUser = fromUser;
        this.body = body;
    }

    public String getToUser() {
        return toUser;
    }

    public String getFromUser() {
        return fromUser;
    }

    public String getBody() {
        return body;
    }
}
  1. Create a new package called packtpub.microservices.ch03.message.controllers and a new class called MessageController. At the moment, your controller doesn’t do much except accept the request, parse the JSON, and return the message instance, as you can see from this code:
package com.packtpub.microservices.ch03.message.controllers;

import com.packtpub.microservices.models.Message;
import org.springframework.web.bind.annotation.*;

@RestController
public class MessageController {

    @RequestMapping(
            path="/messages",
            method=RequestMethod.POST,
            produces="application/json")
    public Message create(@RequestBody Message message) {
        return message;
    }
}
  1. Test this basic service by running it and trying to send a simple request:
$ ./gradlew bootRun
Starting a Gradle Daemon, 1 busy Daemon could not be reused, use --status for details

> Task :bootRun

  . ____ _ __ _ _
 /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/ ___)| |_)| | | | | || (_| | ) ) ) )
  ' |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot :: (v1.5.9.RELEASE)

...

Take a look at the following command line:

$ curl -H "Content-Type: application/json" -X POST http://localhost:8080/messages -d'{"toUser": "reader", "fromUser": "paulosman", "body": "Hello, World"}'

{"toUser":"reader","fromUser":"paulosman","body":"Hello, World"}

Now, you have a basic service working, but it’s pretty dumb and not doing much. Add some intelligence by checking with the social service to verify that your two users have a friendship before allowing the message to be sent. For the purposes of this example, imagine you have a working social service that allows you to check for relationships between users with requests:

GET /friendships?username=paulosman&filter=reader

{
  "username": "paulosman",
  "friendships": [
    "reader"
  ]
}
  1. Before you can consume this service, create a model to store its response. In the packtpub.microservices.ch03.message.models package, create a class called UserFriendships:
package com.packtpub.microservices.ch03.message.models;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;

import java.util.List;

@JsonIgnoreProperties(ignoreUnknown = true)
public class UserFriendships {
    private String username;
    private List friendships;

    public UserFriendships() {}

    public String getUsername() {
        return username;
    }

    public void setUsername(String username) {
        this.username = username;
    }

    public List getFriendships() {
        return friendships;
    }

    public void setFriendships(List friendships) {
        this.friendships = friendships;
    }
}

 

  1. Modify MessageController, adding a method to get a list of friendships for a user, optionally filtering by a username. Note that you’re hardcoding the URL in this example, which is a bad practice. Take a look at the following code:
private List getFriendsForUser(String username, String filter) {
    String url = "http://localhost:4567/friendships?username=" + username + "&filter=" + filter;
    RestTemplate template = new RestTemplate();


UserFriendships friendships = template.getForObject(url, UserFriendships.class);   return friendships.getFriendships();}
  1. Modify the create If the users are friends, continue and return the message as before; if the users are not friends, the service will respond with a 403indicating that the request is forbidden:
@RequestMapping(
            path="/messages",
            method=RequestMethod.POST,
            produces="application/json")
    public ResponseEntity create(@RequestBody Message message) {
        List friendships = getFriendsForUser(message.getFromUser(), message.getToUser());

        if (friendships.isEmpty())
            return ResponseEntity.status(HttpStatus.FORBIDDEN).build();

        URI location = ServletUriComponentsBuilder
                .fromCurrentRequest().path("/{id}")
                .buildAndExpand(message.getFromUser()).toUri();

        return ResponseEntity.created(location).build();
    }

If you found this article interesting, check out Paul Osman’s Microservices Development Cookbook.   The Cookbook will help you work with a team to break a large, monolithic codebase into independently deployable and scalable micro-services.

The post A primer on service-to-service communication appeared first on Superuser.

by Superuser at December 21, 2018 03:03 PM

December 20, 2018

Aptira

Software is Eating the Network. Part 4 – Network Function Virtualisation (NFV)

Aprira: Software is Eating the Network. Network Function Virtualisation (NFV)

Continuing our analysis of the Open Network Software domain, this post is about Network Function Virtualisation (NFV). 

In Open Networking, the network is no longer just a pipe but is part of the computing infrastructure and must perform some computing functions, not just move data between computing devices. 

… you’re not really interested in the devices [on a network] — you’re interested in the fabric, and the functions the network performs for you.

Urs Hölzle -- Google SVP of Infrastructurehttps://www.wired.com/2012/04/going-with-the-flow-google/

As we saw last post, Software Defined Networking (SDN) revolutionised Open Networking by implementing network connectivity in software instead of hardware. 

Rapidly came the next step in the evolution Open Networking: the implementation of software network appliances that implemented the “interesting” functions of the network. In October 2012 (almost immediately after the first major rollout of OpenFlow) Network Function Virtualisation (NFV) is born via a white paper published at the SDN and OpenFlow World Congress in Darmstadt, Germany 

The white paper was presented by an Industry Specification Group (ISG) group called “Network Function Virtualisation”, established within the European Telecommunications Standards Institute (ETSI), and comprising 28 representatives from across 13 telcos from Europe, China, Australia and the USA. 

This white paper outlined the building blocks of NFV and envisaged its purpose to be: 

… to accelerate development and deployment of interoperable solutions based on high volume industry standard servers.

ETSIhttp://portal.etsi.org/NFV/NFV_White_Paper.pdf

Network Function Virtualisation enables Network Services to be easily and flexibly created from software components that can be stitched together “on the fly”. These components implement network functions such as firewalls, load balancers, network address translation (NAT), intrusion detection, domain name service (DNS), and so forth.  

Many of these network functions were previously implemented as dedicated, and often proprietary, physical network devices that had to be instantiated in the network and configured individually to provide an element of end-to-end service capability. 

The 2012 white paper outlined three (3) basic building blocks of NFV: 

  1. Virtualized network function (VNF) which are software implementations of network functions that can be deployed on NFV infrastructure. 
  2. Network function virtualization infrastructure (NFVI), which is the hardware and software environment where VNFs are deployed, which may span several locations.  
  3. Network function virtualization management and orchestration (NFV M&O) is responsible for managing and orchestrating NFVI and VNFs.

In the above model, NFVI’s are the basic cloud infrastructure which we’ve covered in earlier posts, with additional standards-based network function virtualisation capabilities. 

VNF’s are the building blocks of services and were originally conceived as the software equivalent of existing hardware products, as seen in the early VNF implementations from major Network Equipment Providers (NEP’s) such as Ericsson, Cisco or F5. 

As a building block there is no underlying architectural reason that this needs to be the case: VNF’s are just domain-specific cloud applications which can be designed to implement required functions in any number of architectural configurations. This “like for like” VNF implementation of existing products is more an artefact of the relative youth of the NFV lifetime – remember that the NFV spec is younger than the TV series of the Game of Thrones novels!.  More recent VNF implementations are built from the ground up as software components, especially so in open source projects. 

SDLC lifecycle factors are emerging as much bigger influencers in VNF design than existing product architectures. For example, VNF’s integrate better into customer solutions if they are designed as Cloud native applications, i.e. comply with best practice cloud design recommendations such as the 12 factors or the Pets and Cattle analogy. These drivers, and the general openness of software, will result in much more interesting VNF architectures in the future. 

The third Network Function Virtualisation building block is orchestration, which is the management of VNF’s within the NFVI. Orchestration performs on-boarding, lifecycle management and policy management for new Network Services (NS). Orchestration enables the Open Networking solution to more fully capture the benefits of the software enabled implementation of network functions but is dependent on a high level of inter-operability. 

Interoperability is a key benefit of Open Networking components, but it is possible to implement NFV components in ways that don’t fully capture these benefits. Paradoxically, the software-isation trend is both a blessing and a potential curse in the quest for better interoperability, for example there is nothing to prevent a VNF not being implemented as ‘cloud native’ applications or to avoid a partial implementation of reference API’s. 

Based on current experience, we see a very wide range of implementations of VNF’s in the market including by established network vendors.  The implication for integrators and users is that VNF’s need to be carefully onboarded and validated on a case-by-case basis. 

In the last 4 posts, we’ve covered software trends in general and three components of Open Networking, but we’ve only scratched the surface of this huge topic. 

A natural next step in our Open Networking series would be to continue with the third and last domain (Open Network Integration). However, I think it’s important that we continue our examination of the implications of the software-isation of networking and computing infrastructure. 

So, for the next few posts we are going to take a “software interlude” to unpack this topic in more detail.  Then we’ll come back to Open Network Integration and wrap up our series. 

Stay tuned. 

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Software is Eating the Network. Part 4 – Network Function Virtualisation (NFV) appeared first on Aptira.

by Aptira at December 20, 2018 02:57 AM

December 19, 2018

OpenStack Superuser

Inside open infrastructure: The latest from the OpenStack Foundation

Welcome to the latest edition of the OpenStack Foundation Open Infrastructure Newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

Spotlight on…Kata Containers turns one!

Kata Containers is an open source pilot project supported by the OpenStack Foundation. The Kata Containers community is building a standard implementation of extremely lightweight VMs that feel and perform like containers, but provide the workload isolation and security advantages of adding a virtual machine layer. The Kata project is led by Architecture Committee members Eric Ernst (Intel) and Jon Olson (Google), Samuel Ortiz (Intel), Xu Wang (Hyper) and Wei Zhang (Huawei).

Since launching Kata in December 2017, the community has achieved several milestones including the 1.0 release in May followed by several point releases, joining the Open Container Initiative (OCI) and holding the first Architecture Committee elections.

Kata’s 1.0 release completed the merger of Intel’s Clear Containers and Hyper’s runV technologies and delivered an OCI compatible runtime with seamless integration for container ecosystem technologies like Docker and Kubernetes.

Most recently the project has made rapid advancements with the 1.4.0 release which offers better logging, ipvlan/macvlan support through TC mirroring, and NEMU hypervisor support. Since its launch, Kata Containers has scaled to include support for major architectures, including AMD64, ARM and IBM p-series. The 1.5 release is currently planned for mid January 2019, and will offer support for containerd v2 shim among other features. The team is also collaborating and exploring opportunities to support the new Firecracker micro VM alongside NEMU and QEMU.

A recap of Kata’s first year activities can be found in this Superuser post. View the Kata Containers talks from the 2018 KubeCon/CloudNativeCon events here and the 2018 OpenStack Summits here. Looking ahead to 2019 the Kata community plans to focus on growing and supporting it’s users, leading the way for open collaboration around container security efforts and better defining its value within the greater container landscape.

Join these channels to get involved:

– GitHub: github.com/kata-containers

– Slack: bit.ly/KataSlack

– IRC: #kata-dev on Freenode

– Mailing list lists.katacontainers.io

News

OSF

  • The first Open Infrastructure Summit (formerly the OpenStack Summit) will take place April 29-May 1 in Denver, Colorado
    • The Call for Presentations is now open – review the updated Track list  and submit your presentation, panel, or workshop by January 23 at 11:59pm PT. If you’re interested in helping shape the Summit content, apply to be a Programming Committee member.
    • Denver Summit sponsorship opportunities are now available. Check out the packages and reach out to summit@openstack.org with any questions.
    • Register now before prices increase in late February.

OSF Project Updates

OpenStack

Airship

  • The Airship team is currently working away at their 1.0 release due in early 2019. The work in progress for the Airship 1.0 release included improvements across the following areas: Airship will be adding support for Ironic as a bare-metal deployment platform. This work is currently in the proof-of-concept phase. If you’re interested in helping to design the bare-metal interface to the Ironic driver, please join the team for their weekly design meetings. Details for each meeting can be found in the airship-discuss mailing list.

StarlingX

  • The community is actively planning for the contributor meetup on January 15-16, 2019
  • The TSC finalized the initial project governance which is available on the governance page documentation website

Zuul

Questions / Feedback / Contribute
This newsletter is edited by the OpenStack Foundation staff to highlight open infrastructure communities. We want to hear from you!

If you have feedback, news or stories that you want to share, reach us through community@openstack.org and to receive the newsletter, sign up here.

The post Inside open infrastructure: The latest from the OpenStack Foundation appeared first on Superuser.

by OpenStack Foundation at December 19, 2018 04:55 PM

NFVPE @ Red Hat

Create and restore external backups of virtual machines with libvirt

A common need for deployments in production, is to have the possibility of taking backups of your working virtual machines, and export them to some external storage.Although libvirt offers the possibility of taking snapshots and restore them, those snapshots are intended to be managed locally, and are lost when you destroy your virtual machines.There may be the need to just trash all your environment, and re-create the virtual machines from an external backup, so this article offers a procedure to achieve it. First step, create an external snapshot So the first step will be taking an snapshot from your running…

by Yolanda Robla Mota at December 19, 2018 03:35 PM

December 18, 2018

OpenStack Superuser

Rev your engines: Open Infrastructure Summit seeks use cases, workshops for Denver

As you start lining up your 2019 goals, add presenting at the first Open Infrastructure Summit (formerly the OpenStack Summit) to your list. With the name change of the bi-annual event comes a broader focus on the open-source projects being built to support open infrastructure.

Today, the OpenStack Foundation opened the call for presentations (CFP), where anyone can submit a presentation or panel within one of the 11 tracks. The submission deadline is January 23.

To emphasize the broader focus of open infrastructure that the OSF has been implementing since announcing its updated strategy in November 2017, there are are a few changes this cycle:

  • There are three new tracks: Security, Getting Started and Open Development. You can find the Track descriptions here.
  • In addition to OpenStack, you’ll see the new OSF pilot projects —including Airship, Kata Containers, StarlingX and Zuul — front and center alongside over 30 other open source projects including Ansible, Cloud Foundry, Docker, Kubernetes, and many more.
  • If you’re interested in influencing the Summit content, apply to be a Programming Committee member, where you can also find a full list of time requirements and expectations. Nominations will close on January 4, 2019.

First time submitting? Not sure what topic to submit? The Programming Committee members will be sharing the topics they’re looking for within each Track in January.

Stay tuned on Superuser as we post these recommendations.

The post Rev your engines: Open Infrastructure Summit seeks use cases, workshops for Denver appeared first on Superuser.

by Allison Price at December 18, 2018 04:00 PM

Galera Cluster by Codership

Announcing Galera Cluster 3.25 with several security and bug fixes

Codership is pleased to announce the release of Galera Replication library 3.25, implementing wsrep API version 25. The new release includes many bug fixes and fixes to all security issues mentioned in the Oracle Critical Patch Update Advisory – October 2018

As always, Galera Cluster is now available as targeted packages and package repositories for a number of Linux distributions, including Ubuntu, Red Hat, Debian, CentOS, OpenSUSE and SLES, as well as FreeBSD. Obtaining packages using a package repository removes the need to download individual files and facilitates the easy deployment and upgrade of Galera Cluster nodes.

This release incorporates all changes up to MySQL 5.7.24, MySQL 5.6.42 and MySQL 5.5.62.

Galera Replication Library3.25
New features and notable fixes in Galera replication since last binary release by Codership (3.24)

  • A new Galera configuration parameter cert.optimistic_pa was added. If the parameter value is set to true, full parallellization in applying write sets is allowed as determined by certification algorithm. If set to false, no more parallellism is allowed in applying than seen on the master.
  • Support for ECDH OpenSSL engines on CentOS 6 (galera#520)
  • Fixed compilation on Debian testing and unstable (galera#516, galera#528)

Read all notable improvements to Galera Replication library in 3.25 release notes

MySQL 5.7
New release of Galera Cluster for MySQL 5.7, consisting of MySQL-wsrep 5.7.24 and wsrep API version 25.Notable features and bug fixes in MySQL 5.7.24:

  • Support for Ubuntu/Bionic was added in this release.
  • Auth_pam and auth_dialog plugins were added in this release.
  • Rsync SST was not copying tablespace from the donor node (mysql-wsrep#334).
  • Fixes for transaction replaying from stored procedures (mysql-wsrep#336).
  • Fixed a regression where transaction replaying caused a crash when AUTOCOMMIT=OFF (mysql-wsrep#344).

Read all bug fixes in 5.7.24 release notes with known issues

MySQL 5.6
New release of Galera Cluster for MySQL consisting of MySQL-wsrep 5.6.42 and wsrep API version 25.  Notable feature and bug fixes in MySQL 5.6.42:

  • Auth_pam and auth_dialog plugins were added in this release.
  • Rsync SST was not copying tablespace from the donor node (mysql-wsrep#334).
  • Fixes for transaction replaying from stored procedures (mysql-wsrep#336).
  • Fixed crash in write set applying with binlog_rows_query_log_events option enabled (mysql-wsrep#343).

Read all bug fixes in 5.6.42 release notes with known issues

MySQL 5.5 (the last release of 5.5 series)
New release of Galera Cluster for MySQL consisting of MySQL-wsrep 5.5.62 and wsrep API version 25.This release incorporates all changes up to MySQL 5.5.62. Please note that Galera Cluster 5.5.62 will be last the last release of 5.5 series.

  • Maximum variable length was increased to 4096 in order to work around wsrep_provider_options truncation if the provider options string contains long variable values (mysql-wsrep#348).

Read all notable improvements in 5.5.62 release notes with known issues

DOWNLOAD THE NEW GALERA CLUSTER RELEASE!

by Sakari Keskitalo at December 18, 2018 03:50 PM

December 17, 2018

OpenStack Superuser

A birds-eye view of Swift 2.20.0

OpenStack Swift is a durable, scalable, and highly available object storage system. It’s designed for storing unstructured data and is a perfect companion to scalable compute infrastructure, whether bare metal, VMs, or containers.

I’m happy to announce that Swift 2.20.0 is now available. This release includes many improvements, but the bulk of the updates are in three key areas: S3 compatibility, encryption, and performance/optimization.

 

Improved S3 compatibility

Swift incorporated S3 compatibility in the first half of 2018. Since then, we’ve been working on quite a few improvements. One important change we’ve made is to update the way ETag response headers look to better match what S3 clients expect. Specifically, when downloading multipart objects, S3 includes a literal “-” in the ETag, and clients use this information to determine how or if the data is validated after download. Swift’s S3 compatibility layer now matches this functionality, enabling more S3 clients to seamlessly work against Swift.

I’m also happy about an improvement we’ve made to AWS v4 signature validation. Previously, Swift would need to send the signed request to Keystone in order to authorize the account, but now Swift can simply request the signing key from Keystone and validate the request locally. This change allows Swift’s S3 compatibility layer to support many more concurrent connections and requests per second.

We’ve also added some limited support for S3 versioning and updated some default config values to more closely match S3’s behavior.

Encryption updates

Swift has supported at-rest encryption since mid-2016. This feature is designed to protect user data stored on drives to lessen the risk of data leaks if a drive were to leave the storage cluster.

Encryption in Swift uses what we call a “keymaster” to manage access to encryption keys. The keymaster is the piece of code that knows how to fetch the correct encryption keys and where to fetch them from. Swift supports a basic keymaster that stores data in a config file, a keymaster that talks to the OpenStack Barbican service, and a keymaster that talks directly to external key management systems with the KMIP protocol.

In this release, Swift now allows operators to use more than one keymaster at a time. This enables migrations from one key provider to another.

Performance optimization

On the performance side, this release of Swift includes improvements to the erasure code synchronization process. We’ve also added some tuning parameters to several other background processes so that they do not consume excessive CPU cycles in the event they are not IO bound.

Get involved!

I’ve only touched on the highlights from this release. The full changelog is available at https://github.com/openstack/swift/blob/master/CHANGELOG.

This release of Swift is the work of more than 30 developers, including 10 new contributors.

As always, you can upgrade to this version of Swift from any older version with no client downtime. I encourage everyone to upgrade to Swift 2.20.0. There’s plenty to keep us busy as we work on our next release, so if you’d like to join us, please stop by #openstack-swift on freenode IRC.

About the author

John Dickinson has been a project team lead (PTL) for Swift, OpenStack’s object storage service, pretty much since it took off in 2011. At the time he was working at Rackspace, since 2012 he’s been director of technology at San Francisco-based startup SwiftStack. You can find him on Twitter at @notmyname.

 

Cover image Lefteris StavrakasΒουνοσταχτάρα Alpine Swift Tachymarptis melba, CC BY-SA 2.0, Link

The post A birds-eye view of Swift 2.20.0 appeared first on Superuser.

by John Dickinson at December 17, 2018 11:16 PM

How your feedback sparked what’s next for Cinder

During the OpenStack Project Teams Gathering in Denver, the Cinder team gathered to discuss our development for the Stein release.  Part of the time in Denver was used to look at the responses that users shared about Cinder when responding to the OpenStack user survey.  During the review we noted most of the responses fell into a number of categories:

  • Backup/disaster recovery requests
  • Driver capabilities reporting
  • Multi-attach functionality
  • Improved support for actions in in-use volumes
  • Standalone Cinder support
  • High availability support
  • Ease-of-use/configuration issues

With these categories in mind, the team reconvened at the OpenStack Summit in Berlin to discuss this feedback in a Forum session.  We had hoped to engage users more directly during this Forum session but the user/operator turnout for the session was not what we had hoped for.  Instead, the team decided to discuss the feedback again in preparation for this very blog article.

In this post, I’ll provide information on how we’re moving forward given the feedback provided in the user survey. I’ll also share newer functions that are now available that may address some of the concerns. Finally, I’ll share areas of feedback that we need more information to address and provide pointers as to how users/operators can engage the Cinder team to help us understand the requests.

Backup and disaster recovery functionality

A large number of the feedback came around backup functionality.  Backups are an area where the Cinder team has been working for a number of releases to improve support.  For instance, in Rocky the ability to utilize multiple processors on the node running the backups service has been added.  This will allow users to greatly improve the performance of backup functions. We have also added the ability to backup and restore encrypted volumes.

The team is aware that there are gaps in disaster recovery functionality, especially around the handling recovery after a failure when replication is enabled.  In Rocky we added the ability to use the cinder-manage command to fail control from the primary host to the secondary host after a major failure. This makes it so that users no longer need to manually edit the database to failover the host after failure and then, also, makes it easier to failback.

In the Stein release we are working on adding a generic backup driver.  This will make it possible to do backups without a dedicated backups service.  Any volume driver available to Cinder can then be used as a backup target. This will enable users to backup volumes from one storage backend to a secondary storage backend in their environment to help protect against catastrophic failures on one storage backend in the environment.

Another theme in the requests for backup functionality was with regards to adding the ability to schedule backups. While the Cinder team appreciates need for this functionality it is important for users to understand that such support falls outside Cinder’s role.  Cinder and its APIs are focused upon management of block storage resources, not automation of management of those resources. Users who wish to automate storage backup can either automate such processes using Cinder’s API or can use other projects from OpenStack’s ecosystem, like Mistral, to automate such tasks.

Driver capabilities reporting

There were multiple requests for the ability to get more details about what functionality a particular driver/storage backend could support. This is an area where Cinder’s support has been weak for quite some time.  There were champions for improving this functionality back in the Kilo release. The team, however, wasn’t able to agree on a way to implement the functionality and the initiative, more or less, died off. As we have had an increasing number of storage backends and an increasing number of possible capabilities this functionality needs to be re-addressed.

There are members of the Cinder development team who are going to start looking into possible improvements in this functionality.  The goal is to start by getting what capabilities reporting is available on the command-line working properly and then expand it to support a better user experience. Ultimately we would hope to have the ability for admins to see from the Horizon Dashboard what capabilities each storage backends supports so the information then can be used to more easily create volume types.

Multi-attach functionality

Anyone familiar with Cinder knows that this has been a topic of discussion for quite some time.  We finally got base support for this functionality into Cinder and Nova during the Queens release. Since then, driver vendors have been working on implementing multi-attach support for their drivers. Cinder is now up to 13 drivers supporting multi-attach of volumes. Enablement of multi-attach in more drivers will require support from those third party vendors.

The Cinder team is thinking that a number of the requests for multi-attach stem were from the fact that Ceph does not yet support this functionality. This gap in support is being addressed in the Stein release.

There was also a number of requests for read-only multi-attach support. The Cinder team is aware that there are gaps in the ability to properly mount volumes in read-only mode. The team is planning to investigate this functionality during the Stein and Train releases to improve this support.

NOTE:  It’s still important to be aware that the support provided by Cinder is not a ‘magic-bullet’ that makes it possible to attach a Cinder volume to multiple instances in all situations.  The storage and filesystem behind the volume must also support multiple attachments. If a user attempts multi-attachment in read/write mode using a filesystem that doesn’t support multiple writers, filesystem corruption will result.

Actions on in-use volumes

There has been on-going work to improve the operations that can be completed on attached volumes. Recently the ability to extend attached volumes has been implemented. Again, this is a feature that requires support from the vendor’s driver. So this may, or may not, be available in your environment.

A recent area of focus has been on how to handle re-imaging attached volumes so that a base operating system could be recreated for an instance that is booted from volume without having to detach the volume. Efforts are in place to hopefully add this functionality during the Stein or Train releases.

Standalone Cinder support

A number of efforts are currently underway to improve Standalone Cinder Support. Cinder can either be run with or without Keystone in place and there is support for attaching volumes without Nova in place.  Further work is now being done to make the Cinder’s volume drivers available through a new wrapper called ‘cinderlib’. Cinderlib will make it possible for users to interact with Cinder’s volume drivers without all of Cinder’s functionality in place. It’s hoped that cinderlib will provide a way for services like the Container Storage Interface to leverage Cinder’s existing storage drivers.

Improved high-availability support

Incremental movement towards full HA, Active/Active support has been going on within Cinder for the last few releases.  It has been possible to run the API and Scheduler services Active/Active for quite some time. Running the volume service, however, in Active/Active mode has proven more challenging. Depending on whether a driver has any locking in its code, some drivers may be able to run HA A/A but it was not without risk.

In the Stein release, Cinder is planning to adopt the distributed locking techniques used by the Placement Service to ensure that we can safely have multiple volume instances running without potentially ending up with inaccurate quotas or deadlocks attempting to access the database.  The team has also written documentation describing the potential dangers of running the volume service in active/active mode. This documentation should serve as a guide for the storage vendors to test and improve their volume drivers to support HA A/A configurations.

Ease-of-use/configuration improvements

The user feedback came with a number of general requests for improvements in ease of use and configuration. The team is currently doing a number of things to address these concerns.  For instance we are aware that we do not have parity between the functions that are implemented in OpenStack Cinder Client and OpenStack Client/Horizon. To help address this we are planning to put processes in place as we move to using Storyboard as our bug tracker to tag those changes that need to also have changes propagated to OSC and Horizon.  We also have been working to improve our documentation. This entails filling in functions that were never properly documented or that need to have the documentation updated due to changes since the functionality was implemented.

Based on user feedback we have also in recent releases added support for more dynamically configuring logging levels within Cinder, making it possible to make log output more or less verbose without having to restart services.

Much of the other feedback around ease-of-use improvements was quite general. We would like to better understand such requests. Hopefully you can help.

Cinder needs your help

The team has done the best we can to interpret the brief feedback that users can give in the user feedback survey.  Many of the comments, however were very short with limited detail. This is why we need your help to fill in the details.

Some of the areas that had a lot of general feedback that could use additional detail in the request are:  ease of use/configuration issues, requests for improved support for actions on attached volumes, improved usability of snapshots and the addition of more backup functionality outside of the ability to automate creation of backups.

If you have additional details that you would like to share there are a number of ways that you can help.  First, you can send an e-mail to the openstack-discuss@lists.openstack.org mailing list.  Just tag your post with [cinder] and [user request].  We can then discuss the requests over the mailing list.  We also be happy to have you join us during our Weekly Cinder Team Meeting.

The team meets weekly on Wednesdays at 16:00 UTC on the #openstack-meeting IRC channel on freenode.  If you have a topic, please see this page for instructions to add your topic to the meeting: https://wiki.openstack.org/wiki/CinderMeetings  Finally, the team is always available for discussion in the #openstack-cinder IRC channel on freenode.

About the author

Jay Bryant, cloud storage lead at Lenovo, is the project team lead (PTL) for the current Cinder release. He’s also served as the IBM subject matter expert (SME) for Cinder from January 2014 to January 2017.  Find him on Twitter or check out his OpenStack community profile.

Cover photo // CC BY NC

The post How your feedback sparked what’s next for Cinder appeared first on Superuser.

by Jay Bryant at December 17, 2018 02:56 PM

StackHPC Team Blog

Ceph on the Brain: A Year with the Human Brain Project

Background

The Human Brain Project (HBP) is a 10-year EU FET flagship project seeking to “provide researchers worldwide with tools and mathematical models for sharing and analysing large brain data they need for understanding how the human brain works and for emulating its computational capabilities”. This ambitious and far-sighted goal has become increasingly relevant during the lifetime of the project with the rapid uptake of Machine Learning and AI (in its various forms) for a broad range of new applications.

A significant portion of the HBP is concerned with massively parallel applications in neuro-simulation, in analysis techniques to interpret data produced by such applications, and in platforms to enable these. The advanced requirements of the HBP in terms of mixed workload processing, storage and access models are way beyond current technological capabilities and will therefore drive innovation in the HPC industry. The Pre-commercial procurement (PCP) is a funding vehicle developed by the European Commission, in which an industrial body co-designs with a public institution an innovative solution to a real-world technical problem, with the intention of providing the solution as commercialized IP.

The Jülich Supercomputer Centre on behalf of the Human Brain Project entered into a competitive three-phased PCP programme to design next-generation supercomputers for the demanding brain simulation, analysis and data-driven problems facing the wider Human Brain Project. Two consortia - NVIDA and IBM, and Cray and Intel were selected to build prototypes of their proposed solutions. The phase III projects ran until January 2017, but Cray’s project deferred significant R&D investment, and was amended and extended. Following significant activity supporting the research efforts at Jülich, JULIA was finally decommissioned at the end of November.

Introducing JULIA

JULIA

In 2016, Cray installed a prototype named JULIA, with the aim of exploring APIs for access to dense memory and storage, and the effective support of mixed workloads. In this context, mixed workloads may include interactive visualisation of live simulation data and the possibility of applying feedback to "steer" a simulation based on early output. Flexible exploitation of new hardware and software aligns well with Cray's vision of adaptive supercomputing.

JULIA is based on a Cray CS400 system, but extended with some novel hardware and software technologies:

  • 60 Intel Knights Landing compute nodes
  • 8 visualisation nodes with NVIDIA GPUs
  • 4 data nodes with Intel Xeon processor and 2x Intel Fultondate P3600 SSDs
  • All system partitions connected using the Omnipath interconnect
  • Installation of a remote visualization system for concurrent, post-processing and in-transit visualization of data primarily from neurosimulation.
  • An installed software environment combining conventional HPC toolchains (Cray, Intel, GNU compilers), and machine learning software stacks (e.g. Theano, caffe, TensorFlow)
  • A storage system consisting of SSD-backed Ceph
JULIA

StackHPC was sub-contracted by Cray in order to perform analysis and optimisation of the Ceph cluster. Analysis work started in August 2017.

Ceph on JULIA

The Ceph infrastructure comprises four data nodes, each equipped with two P3600 NVME devices and a 100G Omnipath high-performance network:

JULIA

Each of the NVME devices is configured with four partitions. Each partition is provisioned as a Ceph OSD, providing a total of 32 OSDs.

JULIA

The Ceph cluster was initially running the Jewel release of Ceph (current at the time). After characterising the performance, we started to look for areas for optimisation.

High-Performance Fabric

The JULIA system uses a 100G Intel Omni-Path RDMA-centric network fabric, also known as OPA. This network is conceptually derived and evolved from InfiniBand, and reuses a large proportion of the InfiniBand software stack, including the Verbs message-passing API.

Ceph's predominant focus on TCP/IP-based networking is supported through IP-over-InfiniBand, a kernel network driver that enables the Omni-Path network to carry layer-3 IP traffic.

The ipoib network driver enables connectivity, but does not unleash the full potential of the network. Performance is good on architectures where a processor core is sufficiently powerful to maintain a significant proportion of line speed and protocol overhead.

This sankey diagram illustrates the connectivity between different hardware components within JULIA:

JULIA

In places there are two arrows, as the TCP performance was found to be highly variable. Despite some investigation, the underlying reason for the variability is still unclear to us.

Using native APIs, Omni-Path will comfortably saturate the 100G network link. However, the ipoib interface falls short of the mark, particularly on the Knights Landing processors.

Raw Block Device Performance

In order to understand the overhead of filesystem and network protocol, we attempt to benchmark the system at every level, moving from the raw devices up to the end-to-end performance between client and server. In this way, we can identify the achievable performance at each level, and where there is most room for improvement.

Using the fio I/O benchmarking tool, we measure the aggregated block read performance of all NVME partitions in a single JULIA data server. We used four fio clients per partition (32 in total) and 64KB reads. The results are stacked to get the raw aggregate bandwidth for single node:

JULIA

The aggregate I/O read performance achieved by the data server is approximately 5200 MB/s. If we compare the I/O read performance per node with the TCP/IP performance across the ipoib interface, we can see that actually the two are somewhat comparable (within the observed bounds of variability in ipoib performance):

JULIA

Taking into account that heuristic access patterns are likely to include serving data from the kernel buffer cache taking a sizeable proportion of each data node's 64G RAM, the ipoib network performance is likely to become a bottleneck.

Jewel to Luminous

Preserving the format of the backend data store, the major version of Ceph was upgraded from Jewel to Luminous. Single-client performance was tested using rados bench before and after the upgrade:

JULIA

The results that we see indicate a solid improvement for smaller objects (below 64K) but negligible difference otherwise, and no increase in peak performance.

Filestore to Bluestore

The Luminous release of Ceph introduced major improvements in the Bluestore backend data store. The Ceph cluster was migrated to Bluestore and tested again with a single client node and rados bench:

JULIA

There is a dramatic uplift in performance for larger objects for both reads and writes. The peak RADOS object bandwidth is also within the bounds of the observed limits achieved by the ipoib network interface. This level of performance is becoming less of an I/O problem and more of a networking problem.

That's a remarkable jump. What just happened?

The major differences appear to be the greater efficiency of a bespoke storage back-end over a general-purpose filesystem, and also reduction in the amount of data handling through avoiding writing first to a journal, and then to the main store.

Write Amplification

For every byte written to Ceph via the RADOS protocol, how many bytes are actually written to disk? To find this, we sample disk activity using iostat, aggregate across all devices in the cluster and compare with the periodic bandwidth reports of rados bench. The result is a pair of graphs, plotting RAODS bandwidth against bandwidth of the underlying devices, over time.

Here's the results for the filestore backend:

Filestore backend, RADOS bandwidth
Filestore backend, iostat bandwidth

There appears to be a write amplification factor of approx 4.5x - the combination of a 2x replication factor, having every object written first through a collocated write journal, and an small amount of additional overhead for filesystem metadata.

What is interesting to observe is the periodic freezes in activity as the test progresses. These are believed to be the filestore back-end subdividing object store directories when they exceed a given threshold.

Plotted with the same axes, the bluestore configuration is strikingly different:

Filestore backend, RADOS bandwidth
Filestore backend, iostat bandwidth

The device I/O performance is approximately doubled, and sustained. The write amplification is reduced from 4.5x to just over 2x (because we are benchmarking here with 2x replication). It is the combination of these factors that give us the dramatic improvement in write performance.

Sustained Write Effects

Using the P3600 devices, performing sustained writes for long periods eventually leads to performance degradation. This can be observed in a halving of device write performance, and erratic and occastionally lengthy commit times.

This effect can be seen in the results of rados bench when plotted over time. In this graph, bandwidth is plotted in green and commit times are impulses in red:

JULIA

This effect made it very hard to generate repeatable write benchmark results. It was assumed the cause was activity within the NVME controller when the available resource of free blocks became depleted.

Scaling the Client Load

During idle periods on the JULIA system it was possible to harness larger numbers of KNL systems as Ceph benchmark clients. Using concurrent runs of rados bench and aggregating the results, we could get a reasonable idea of Ceph's scalability (within the bounds of the client resources availalbe).

We were able to test with up configurations of to 20 clients at a time:

Luminous Ceph, RADOS read performance

It was interesting to see how the cluster performance became erratic under heavy load and high client concurrency.

The storage cluster BIOS and kernel parameters were reconfigured to a low-latency / high-performance profile, and processor C-states were disabled. This appeared to help with sustaining performance under high load (superimposed here in black):

Luminous Ceph, RADOS read performance

Recalling that the raw I/O read performance of each OSD server was benchmarked at 5200 MB/s, giving an aggregate performance across all four servers of 20.8 GB/s, our peak RADOS read performance of 16.5 GB/s represents about 80% of peak raw performance.

Spectre/Meltdown Strikes

At this point, microcode and kernel mitigations were applied for the Spectre/Meltdown CVEs. After retesting, the raw I/O read performance the aggregate performance per OSD server was found to have dropped by over 15%, from 5200 MB/s to 4400 MB/s. The aggregate raw read performance of the Ceph cluster was now 17.6 GB/s.

Luminous to Mimic

Along with numerous improvements and optimisations, the Mimic release also heralded the deprecation of support for raw partitions for OSD backing, in favour of standardising on LVM volumes.

Using an Ansible Galaxy role, we zapped our cluster and recreated a similar configuration within LVM. We retained the same configuration of four OSDs associated with each physical NVME device. Benchmarking the I/O performance using fio revealed little discernable difference.

We redeployed the cluster using LVM and ceph-ansible and re-ran the rados bench tests. The difference when using Ceph was dramatic for object sizes of 64K and bigger:

Mimic Ceph, LVM OSDs, RADOS read performance

Reprovisioning again with partitions (and ignoring the deprecation warnings) restored and increased levels of performance:

Mimic Ceph, raw partition  OSDs, RADOS read performance

Taking into account the Spectre/Meltdown mitigations, Ceph Mimic is delivering up to 92% efficiency over RADOS protocol.

UPDATE: After presenting these findings at Ceph Day Berlin, Sage Weil introduced me to the Ceph performance team at Red Hat, and in particular Mark Nelson. Mark helped confirm the issue and with analysis on the root cause. It looks likely that Bluestore+LVM suffers the same issue as XFS+LVM on Intel NVMe devices as reported here (Red Hat subscription required). The fix is to ugrade the kernel to the latest available for Red Hat / CentOS systems.

Unfortunately by this time JULIA reached the end of the project lifespan and we were not able to verify this. However, on a different system with a newer hardware configuration, I was able to confirm that the performance issues occur with kernel-3.10.0-862.14.4.el7 and are resolved in kernel-3.10.0-957.1.3.el7.

Native Network Performance for HPC-Enabled Ceph

When profiling the performance of this system using perf and flame graph analysis, I found that under high load 52.5% of the time appeared to be spent in netowrking, either in the Ceph messenger threads, the kernel TCP/IP stack or the low-level device drivers.

Mimic Ceph, flame graph profile

A substantial amount of this time is actually spent in servicing page faults (a side-effect of the Spectre/Meltdown mitigations) when copying socket data between kernel space and user space. This performance data makes a strong case, at least for systems with this balance of compute, storage and networking, for bypassing kernel space, bypassing TCP/IP (with its inescapable copying of data) and moving to a messenger class that offers RDMA.

When the Julia project end was announced, and our users left the system, we upgraded Ceph one final time, from Mimic to master branch.

Ceph, RDMA and OPA

Ceph has included messenger classes for RDMA for some time. However, our previous experience of using these with a range of RDMA-capable network fabrics (RoCE, InfiniBand and now OPA) was that the messenger classes for RDMA work reasonably well for RoCE but not for Infiniband or OPA.

For RDMA support, the systemd unit files for all communicating Ceph processes must have virtual memory page pinning permitted, and access to the devices required for direct communication with the network fabric adapter:

For example, in /usr/lib/systemd/system/ceph-mon@.service, add:

[Service]
LimitMEMLOCK=infinity
PrivateDevices=no

Clients also require support for memory locking, which can be added by inserting the following into /etc/security/limits.conf:

* hard memlock unlimited
* soft memlock unlimited

Fortunately Intel recently contributed support for iWARP (another RDMA-enabled network transport), which is not actually iWARP-specific but does introduce use of a protocol parameter broker known as the RDMA connection manager, which provides greater portability for RDMA connection establishment on a range of different fabrics.

To enable this support in /etc/ceph/ceph.conf (here for the OPA hfi1 NIC):

ms_async_rdma_device_name = hfi1_0
ms_async_rdma_polling_us = 0
ms_async_rdma_type = iwarp
ms_async_rdma_cm = True
ms_type = async+rdma

Using the iWARP RDMA messenger classes (but actually on OPA and InfiniBand) got us a lot further thanks to the connection manager support. However, with OPA the maintenance of cluster membership was irregular and unreliable. Further work is required to iron out these issues, but unfortunately our time on JULIA has completed.

Looking Ahead

The project drew to a close before our work on RDMA could be completed to satisfaction, and it is premature to post results here. I am aware of other people becoming increasingly active in the Ceph RDMA messaging space. In 2019 I hope to see the release of a development project by Mellanox to develop a new RDMA-enabled messenger class based on the UCX communication library. (An equivalent effort to perform the equivalent in libfabric could be even more compelling).

Looking further ahead, the adoption of Scylla's Seastar could potentially become a game-changer for future developments with high-performance hardware-offloaded networking.

For RDMA technologies to be adopted more widely, the biggest barriers appear to be testing and documentation of best practice. If we can, at StackHPC we hope to become more active in these areas through 2019.

Acknowledgements

This work would not have been possible (or been far less informative) without the help and support of a wide group of people:

  • Adrian Tate and the team from the Cray EMEA Research Lab
  • Dan van der Ster from CERN
  • Mark Nelson, Sage Weil and the team from Red Hat
  • Lena Oden, Bastian Tweddell and the team from Jülich Supercomputer Centre

by Stig Telfer at December 17, 2018 01:00 PM

December 16, 2018

Trinh Nguyen

Searchlight weekly report - Stein R-17


I finished the tasks at work earlier this week so I can have free time to write this week report for Searchlight. The main focus of this week is to help Searchlight pass functional tests. After several attempts, I figured out that the functional tests fail because for some reasons ElasticSearch needs more time to start. So what I did is to tell the test set to wait 10 seconds before ElasticSearch to fully up and running [1]. I also updated some packages and simplified the test-setup.sh script by moving the jdk8 installation logic out to the bindep.txt [2] [3].

Looking deeper into the issue, I can observe that ElasticSearch has some strange behavior in the new test environment which is Ubuntu 18.04. The ElasticSearch installation task in the devstack test also fails [4]. I'm not sure what happened with the packages of Ubuntu but maybe we need to tune the test setup for ElasticSearch to make it work again.

Anyway, we can merge new code of Searchlight now!! Yay!!!


Reference:

[1] https://review.openstack.org/#/c/621996/8/searchlight/tests/functional/__init__.py
[2] https://review.openstack.org/#/c/621996/8/tools/test-setup.sh
[3] https://review.openstack.org/#/c/621996/8/bindep.txt
[4] https://review.openstack.org/#/c/625174/

by Trinh Nguyen (noreply@blogger.com) at December 16, 2018 03:20 PM

December 14, 2018

Emilien Macchi

OpenStack Containerization with Podman – Part 4 (Healthchecks)

For this fourth episode, we’ll explain how we implemented healthchecks for Podman containers. Don’t miss the first, second and third episodes where we learnt how to deploy, operate and upgrade Podman containers.

In this post, we’ll see the work that we have done to implement container healthchecks with Podman.

Note: Jill Rouleau wrote the code in TripleO to make that happen.

Context

Docker can perform health checks directly in the Docker engine without the need of an external monitoring tool or sidecar containers.

A script (usually per-image) would be run by the engine and the return code would define if whether or not a container is healthy.

Example of healthcheck script:

curl -g -k -q --fail --max-time 10 --user-agent curl-healthcheck \
--write-out "\n%{http_code} %{remote_ip}:%{remote_port} %{time_total} seconds\n" https://my-app:8774 || return 1

It was originally built so unhealthy containers can be rescheduled or removed by Docker engine. The health could be verified by docker ps or docker inspect commands:

$ docker ps
my-app "/entrypoint.sh" 30 seconds ago Up 29 seconds (healthy) 8774/tcp my-app

However with Podman we don’t have that kind of engine anymore but having that monitoring interface has been useful in our architecture, so our operators can use this interface to verify the state of the containers.

Several options were available to us:

  • systemd timers (like cron) to schedule the health checks. Example documented on coreos manuals.
  • use Podman Pods with Side Car container running health checks.
  • add a scheduling function to conmon with a scheduling function.
  • systemd service, like a podman-healthcheck service that would run on a fixed interval.

If you remember from the previous posts, we decided to get some help from systemd to control the containers, like automatically restart on failure and also automatic start at boot. With that said, we decided to go with the first option, which seems the easier to integrate and the less invasive.

Implementation

The systemd timer is a well-known mechanism. It’s basically a native feature in systemd that allows to run a specific service in a time controlled. The service would be a “OneShot” type, executing the healthcheck script present in the container image.

Here is how we did for our OpenStack containers (with a timer of 30 seconds for healthchecks, configurable like a cron):

# my_app_healthcheck.timer

[Unit]
Description=my_app container healthcheck
Requires=my_app_healthcheck.service
[Timer]
OnUnitActiveSec=90
OnCalendar=*-*-* *:*:00/30
[Install]
WantedBy=timers.target
# my_app_healthcheck.service

[Unit]
Description=my_app healthcheck
Requisite=my_app.service
[Service]
Type=oneshot
ExecStart=/usr/bin/podman exec my_app /bin/healthcheck
[Install]
WantedBy=multi-user.target

Activate the timer and service:

$ systemctl daemon-reload
$ systemctl enable --now my_app_healthcheck.service
$ systemctl enable --now my_app_healthcheck.timer

Check the service & timer status:

$ service my_app_healthcheck status
Redirecting to /bin/systemctl status my_app_healthcheck.service
● my_app_healthcheck.service - my_app healthcheck
   Loaded: loaded (/etc/systemd/system/my_app_healthcheck.service; enabled; vendor preset: disabled)
   Active: activating (start) since Fri 2018-12-14 20:11:00 UTC; 158ms ago
 Main PID: 325504 (podman)
   CGroup: /system.slice/my_app_healthcheck.service
           └─325504 /usr/bin/podman exec my_app /bin/healthcheck
Dec 14 20:11:00 myhost.localdomain systemd[1]: Starting my_app healthcheck...

$ service my_app_healthcheck.timer status
Redirecting to /bin/systemctl status my_app_healthcheck.timer
● my_app_healthcheck.timer - my_app container healthcheck
   Loaded: loaded (/etc/systemd/system/my_app_healthcheck.timer; enabled; vendor preset: disabled)
   Active: active (waiting) since Fri 2018-12-14 18:42:22 UTC; 1h 30min ago
Dec 14 18:42:22 myhost.localdomain systemd[1]: Started my_app container healthcheck.

$ systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                                               ACTIVATES
Fri 2018-12-14 20:14:00 UTC  361ms left    Fri 2018-12-14 20:13:30 UTC  29s ago      my_app_healthcheck.timer               my_app_healthcheck.service

Now it’s implemented, let’s try it!

Demo

Stay in touch for the next post in the series of deploying TripleO and Podman!

Source of the demo.

by Emilien at December 14, 2018 09:46 PM

Ben Nemec

Openstack Virtual Baremetal 2.0 Update

As mentioned in a previous update, OVB 2.0 is coming. This update to is to notify everyone that a development branch is available in the repo and discuss some of the changes made so far.

First, here's the the 2.0-dev branch. It currently contains most, if not all, of the 2.0 changes that I have planned, so it can be used for testing. Please do test it if you depend on OVB for anything. I believe the migration should be fairly painless if you've been keeping up with the recommended deployment methods, but if you do find any problems let me know as soon as possible.

Here's a general overview of the changes for OVB 2.0:

  • Routed networks. 2.0 is based on the routed-networks branch so it supports all of the functionality added as part of that work.
  • Using the parameters section of a Heat environment is no longer supported. Everything must be in parameter_defaults now.
  • port-security is used by default. This requires at minimum a Mitaka OpenStack cloud, although most of my testing has been Newton or higher. Symlinks have been left for the old port-security environments to ease migration. This also means it is no longer recommended to use the Neutron noop firewall driver.
  • Most, if not all, deprecated features and options have been removed.
  • The BMC now signals back to Heat whether it has succeeded or failed to bring up the BMC services. This should allow failures to be caught much earlier in the deploy process.
  • The BMC install script now uses only OpenStackClient, rather than the service-specific clients. Note that this requires an updated BMC image. Existing BMC images can still be used, but they will have to install the openstackclient package. It is recommended that you pull down the new image so no external resources are needed for BMC deployment. Note that the new BMC image is now the default one for download, but if you need the old one for some reason it is still available too.
  • Some (mostly internal) interfaces have changed. As a result, if you have custom environments or templates it is possible that they will need updating for use with 2.0. If you're only using the default templates and environments shipped with OVB they should continue to work.

EDIT I forgot one other thing. There's an RFE for making the BMC installation more robust that isn't yet in 2.0. I have a GitHub issue open to track it. It doesn't involve any breaking changes though so it doesn't have to block 2.0 from going live (in fact, it could be done in 1.0 too). /EDIT

I think that covers the highlights. If you have any questions or concerns about these changes don't hesitate to contact me, either through the GitHub repo or #tripleo on Freenode. Thanks.

by bnemec at December 14, 2018 06:12 PM

OpenStack Superuser

Why evolving tech is powered by new, diverse contributors

“Ba bump. Ba bump. Do you hear it? That’s the heartbeat of the data center, powered by OpenStack. That is what you folks have built. We have so much strength in this community. You’ve worked so hard to enable deployable, stable, mature, secure software that enables private and hybrid cloud models to flourish across the ecosystem,” said Melissa Evers-Hood, director of edge and cloud orchestration stacks at Intel as she spoke to the audience at the recent OpenStack Summit in Berlin. This strength is due, in part, to the amazing diversity within the OpenStack community, which was great to see throughout the week.

As OpenStack evolves to support open infrastructure—highlighted by Mark Collier through pilot projects Airship, Kata Containers, StarlingX and Zuul—and as exciting new use cases emerge, the ability to attract and retain diverse talent, and for us to collaborate across many different projects and communities, has never been more important.

It was great to see Joseph Sandoval talk about the importance of new contributors, and of mentors and mentorship programs as core to sustaining our community in his keynote. “OpenStack is a strategic platform that I believe will enable diversity … “I’ve been helping individuals … bringing them along and pointing them in the direction of open source projects where they can learn and find their place within those communities, and giving them the technical acumen so they can succeed and find their way.” He reminded us that, “Mentorship comes in many forms … Show up and support these programs. These programs need you.”

The mentorship theme was carried through a speed mentoring luncheon, facilitated by Amy Marrich and Nicole Huesman. The workshop, which has become a mainstay at the summits, attracted a nice turnout and supported rich engagement between mentors and mentees across career, community and technical tracks.

The diversity luncheon, hosted by the Diversity Working Group and sponsored by Intel, opened its doors for the first time to a more diverse audience. This evolution seemed to parallel that of OpenStack itself, and it was wonderful to see full representation—across women, men and underrepresented minorities—to recognize and celebrate the diversity within our community. Melissa Evers-Hood from Intel welcomed guests and Madhuri Kumari, cloud software engineer at Intel, offered her personal experiences of a young, new contributor in the OpenStack community. Joseph Sandoval then offered his insights about the importance of male allies and advocates, as well as mentoring, to build more diverse, inclusive communities.

Ell Marquez, technical evangelist at Linux Academy, led a mentoring panel discussion on the last day of the event. The panel discussion helped facilitate a robust discussion about mentorship programs, with two points of clarity. It’s clear that there’s deep interest in mentorship programs. While the OpenStack community has several programs and resources for new contributors, greater awareness and education is needed around how to tap into them—a great problem to have!

Panelists included:

  • Amy Marrich, OpenStack User Committee Member and Diversity Working Group chair, and OpenStack course architect at Linux Academy
  • Nicole Huesman, community and developer advocate, Intel
  • Jill Rouleau, senior software engineer, Red Hat
  • Daniel Izquierdo, co-founder, Bitergia

The week in Berlin was an inspiring testament to the progress and momentum of the diversity within the OpenStack community. As OpenStack takes shape as the foundation for the open infrastructure and new projects and technologies emerge to tackle the challenges of IoT, edge and other exciting use cases, we’ll continue to strive for greater diversity within our community and welcome new contributors of all sizes, shapes and colors into our thriving community.

Stay tuned for how to get involved in upcoming events for the Open Infrastructure Summit in 2019.

The post Why evolving tech is powered by new, diverse contributors appeared first on Superuser.

by Superuser at December 14, 2018 03:19 PM

December 13, 2018

OpenStack Superuser

How to use Alexa to keep an eye on your virtual machines

With about 100 lines of Python code, Amazon’s virtual assistant Alexa check up on your data center.

“Bear with me, I’m not a programmer” says Sebastian Wenner of T-mobile before offering up his proof-of-concept of how to connect Alexa with the OpenTelekom Cloud at the recent OpenStack Summit Berlin.

He showed participants what components are needed, how to interact with Alexa and the Alexa Skills Kit (ASK) and demonstrated a few functions, too.  To create the POC, he “just applied some common sense here and some historic knowledge from my studies in the last millennium.”

Why use Alexa? She’s already a fixture in many homes — playing music, controlling smart devices, news and weather — and the device is only expected to become more popular. By 2022, more than half of American homes will have a smart speaker, according to Juniper Research.

Here’s what you’ll need to get started:


And then you need to learn how to talk to Amazon’s cloud-based voice service. The first thing that you need to know is something called an “utterance” so something that you say, Wenner notes. You define a string like “status report” that’s heard by the device, then that utterance gets translated to an intent and the intent will be called to  trigger an action.

Wenner’s example?

“Alexa start OTC control center.

Alexa: “OTC control center is online.”

“How many VMs are running inside my tenant?”

Alexa: “The total number of virtual machines in your tenant is 52 at the moment, 39 are running, 13 are shut down.”

“Shut down.”

Alexa: “OTC center shutting down.”

This isn’t the first time a Stacker has made a connection with Alexa. In 2017, Jaesuk Ahn and Seungkyu Ahn kicked off OpenStack Days Korea with a live demo asking Amazon’s Alexa AI to deploy OpenStack in three minutes using containers. (Check out that demo here.)

See Wenner show off his handiwork in the 9-minute video below and download the slides with details on the architecture here or more on GitHub.

Cover photo // CC BY NC

The post How to use Alexa to keep an eye on your virtual machines appeared first on Superuser.

by Superuser at December 13, 2018 04:58 PM

December 12, 2018

SWITCH Cloud Blog

Hack Neutron to add more IP addresses to an existing subnet

When we designed our OpenStack cloud at SWITCH, we created a network in the service tenant, and we called it private.

This network is shared with all tenants and it is the default choice when you start a new instance. The name private comes from the fact that you will get a private IP via dhcp. The subnet we choosed for this network is the 10.0.0.0/24. The allocation pool goes from 10.0.0.2 to 10.0.0.254 and it can’t be enlarged anymore. This is a problem because we need IP addresses for many more instances.

In this article we explain how we successfully enlarged this subnet to a wider range: 10.0.0.0/16. This operation is not a feature supported by Neutron in Juno, so we show how to hack into Neutron internals. We were able to successfully enlarge the subnet and modify the allocation pool, without interrupting the service for the existing instances.

In the following we assume that the network we are talking about has only 1 router, however this procedure can be easily extended to more complex setups.

What you should know about Neutron, is that a Neutron network has two important namespaces in the OpenStack network node.

  • The qrouter is the router namespace. In our setup one interface is attached to the private network we need to enlarge and a second interface is attached to the external physical network.
  • The qdhcp name space has only 1 interface to the private network. On your OpenStack network node you will find that a dnsmasq process is running bound to this interface to provide IP addresses via DHCP.
Neutron Architecture

Neutron Architecture

In the figure Neutron Architecture we try to give an overview of the overall system. A Virtual Machine (VM) can run on any remote compute node. The compute node has a Open vSwitch process running, that collects the traffic from the VM and with proper VXLAN encapsulation delivers the traffic to the network node. The Open vSwitch at the network node has a bridge containing both the qrouter namespace internal interface and the qdhcp namespace, this will make the VMs see both the default gateway and the DHCP server on the virtual L2 network. The qrouter namespace has a second interface to the external network.

Step 1: hack the Neutron database

In the Neutron database look for the subnet, you can easily find your subnet in the table matching the service tenant id:

select * from subnets WHERE tenant_id='d447c836b6934dfab41a03f1ff96d879';

Take note of id (that in this table is the subnet_id) and network_id of the subnet. In our example we had these values:

id (subnet_id) = 2e06c039-b715-4020-b609-779954fa4399
network_id = 1dc116e9-1ec9-49f6-9d92-4483edfefc9c
tenant_id = d447c836b6934dfab41a03f1ff96d879

Now let’s look into the routers database table:

select * from routers WHERE tenant_id='d447c836b6934dfab41a03f1ff96d879';

Again filter for the service tenant. We take note of the router ID.

 id (router_id) = aba1e526-05ca-4aca-9a80-01601cdee79d

At this point we have all the information we need to enlarge the subnet in the Neutron database.

update subnets set cidr='NET/MASK' WHERE id='subnet_id';

So in our example:

update subnets set cidr='10.0.0.0/16' WHERE id='2e06c039-b715-4020-b609-779954fa4399';

Nothing will happen immediately after you update the values in the Neutron mysql database. You could reboot your network node and Neutron would rebuild the virtual routers with the new database values. However, we show a better solution to avoid downtime.

Step 2: Update the interface of the qrouter namespace

On the network node there is a namespace qrouter-<router_id> . Let’s have a look at the interfaces using iproute2:

sudo ip netns exec qrouter-(router_id) ip addr show

With the values in our example:

sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr show

You will see the typical Linux output with all the interfaces that live in this namespace. Take note of the interface name with the address 10.0.0.1/24 that we want to change, in our case

 qr-396e87de-4b

Now that we know the interface name we can change IP address and mask:

sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr add 10.0.0.1/16 dev qr-396e87de-4b
sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr del 10.0.0.1/24 dev qr-396e87de-4b

Step 3: Update the interface of the qdhcp namespace

Still on the network node there is a namespace qdhcp-<network_id>. Exactly in the same way we did for the qrouter namespace we are going to find the interface name, and change the IP address with the updated netmask.

sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr add 10.0.0.2/16 dev tapadebc2ff-10
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr del 10.0.0.2/24 dev tapadebc2ff-10
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show

The dnsmasq process running bounded to the interface in the qdhcp namespace, is smart enough to detect automatically the change in the interface configuration. This means that the new instances at this point will get via DHCP a /16 netmask.

Step 4: (Optional) Adjust the subnet name in Horizon

We called the subnet name 10.0.0.0/24. For pure cosmetic we logged in the Horizon web interface as admin and changed the name of the subnet to 10.0.0.0/16.

Step 5: Adjust the allocation pool for the subnet

Now that the subnet is wider, the neutron client will let you configure a wider allocation pool. First check the existing allocation pool:

$ neutron subnet-list | grep 2e06c039-b715-4020-b609-779954fa4399

| 2e06c039-b715-4020-b609-779954fa4399 | 10.0.0.0/16     | 10.0.0.0/16      | {"start": "10.0.0.2", "end": "10.0.0.254"}           |

You can resize easily the allocation pool like this:

neutron subnet-update 2e06c039-b715-4020-b609-779954fa4399 --allocation-pool start='10.0.0.2',end='10.0.255.254'

Step 6: Check status of the VMs

At this point the new instances will get an IP address from the new allocation pool.

As for the existing instances, they will continue to work with the /24 address mask. In case of reboot they will get via DHCP the same IP address but with the new address mask. Also, when the DHCP lease expires, depending on the DHCP client implementation, they will hopefully get the updated netmask. This is not the case with the default Ubuntu dhclient, that will not refresh the netmask when the IP address offered by the DHCP server does not change.

The worst case scenario is when the machine keeps the old /24 address mask for a long time. The outbound traffic to other machines in the private network might experience a suboptimal routing through the network node, that will be used as a default gateway.

Conclusion

We successfully expanded a Neutron network to a wider IP range without service interruption. Understanding Neutron internals it is possible to make changes that go beyond the features of Neutron. It is very important to understand how the values in the Neutron database are used to create the network namespaces.

We understood that a better design for our cloud would be to have a default Neutron network per tenant, instead of a shared default network for all tenants.

by Saverio Proto at December 12, 2018 02:14 PM

Aptira

Software is Eating the Network. Part 3 – Software Defined Networking (SDN)

Aprira: Software is Eating the Network. Software Defined Networking (SDN)

In this and the next post, we’ll be covering the last two components of the Open Networking Open Network Software domain. This post is about Software Defined Networking, and the next post will be on Network Functions Virtualisation. 

By the mid-2000’s, as massive demand for network services drove enterprises to build larger and larger networks, a paradox emerged: Product innovation at the component level was still heavily dominated by proprietary vendor architectures and at the same time, scaling and operationalising networks at the levels being driven by corporations like Google and Amazon, and even the US Government, was problematic – and expensive. 

Once you bought a piece of networking hardware, you didn't really have the freedom to re-program it. … You would buy a router from Cisco and it would come with whatever protocols it supported and that’s what you ran.

Scott Shenker, UC Berkeley computer science professor and former Xerox PARC researcherhttps://www.wired.com/2012/04/nicira/

At one level, there is good reason for this: operational stability.  

If you buy switches from a company and you expect them to work. A networking company doesn't want to give you access and have you come running to them when your network melts down because of something you did.

Scott Shenker

On the other hand, this characteristic gave the network vendors enormous control and leverage over their marketplaces. Companies that bought their products were highly dependent on these vendors for both for feature innovation and also to address operational issues. 

A small number of companies sought to address these dependencies. In 2005, Google started to build its own networking hardware, in part because it needed more control over how the hardware operated. Again in 2010, Google was building its own networking hardware for its “G-Scale Network”. But Google was less interested in building its own network hardware than it was in driving the softwar-isation of the network stack. 

It’s not hard to build networking hardware. What’s hard is to build the software itself as well.

Urs HölzleGoogle’s SVP of Infrastructure

In 2008, Google had deployed an internally developed Load Balancer called Maglev which was entirely based on software running on commodity hardware. 

Although Google was quite capable of developing software components internally, when it came to G-Scale Network they would turn to an outside development and change the balance between hardware and software forever. 

In 2003 Martin Casado, while studying at Stanford, began to develop OpenFlow, software that enabled a new type of network that exists only as software and that you can control independently of the physical switches and routers running beneath it. Casado wanted a network that can be programmed like a general-purpose computer and that could work with any networking hardware. 

Anyone can buy a bunch of computers and throw a bunch of software engineers at them and come up with something awesome, and I think you should be able to do the same with the network. We've come up with a network architecture that lets you have the flexibility you have with computers, and it works with any networking hardware.

Martin Casadohttps://www.wired.com/2012/04/nicira/

This approach is called Software Defined Networking (SDN). With SDN, the network hardware became less relevant.  Large users were able either to develop software solutions on commodity hardware or go direct to manufacturers in Asia (who also supplied big network vendors) and buy the basic network hardware directly. 

Software Defined Networking purposefully combines the flexibility of software development with the raw power of network devices to produce an intelligent network fabric. 

In 2011, the Open Networking Foundation (ONF) was founded to transfer control of OpenFlow to a not-for-profit organization, and the ONF released version 1.1 of the OpenFlow protocol on 28 February 2011. Interest boomed and prompted new product lines from vendor startups offering SDN capabilities on open hardware-based equipment. 

Google rolled out its “G-Scale Network“, entirely based on OpenFlow by 2012, and many of the top-ranked internet companies implemented SDN-based networks in parallel or soon after. 

Based on SDN, software now plays a controlling part in the open network and enables a new set of applications to be built that leverage this network fabric. With the rollout of SDN as a major driver of network design, deployment, operation and evolution, we see the requirements for success changing. Successful networks are now less about bolting together nodes and links with (relatively) pre-defined characteristics.  Successful networks now start to take on aspects of the Software Development Lifecycle (SDLC).  This change has huge implications for enterprises of all sizes because it completely changes the fundamental paradigm of how distributed computing resources are deployed. 

Hot on the heels of SDN came another key technology within Open Networking, i.e. Network Functions Virtualisation (NFV). NFV only further entrenches the software-isation of computing infrastructure and the need for a ‘software first’ paradigm. 

We’ll see more about this in our post.  Stay tuned. 

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Software is Eating the Network. Part 3 – Software Defined Networking (SDN) appeared first on Aptira.

by Adam Russell at December 12, 2018 10:30 AM

OpenStack Superuser

Containers on a Fortnite scale

SEATTLE — You might think that the only thing more exhilarating than playing Fortnite is being responsible for uptime on a game with 200 million registered users that raked in $300 million in revenue in a single month.

You’d be wrong. The game credited with being so addictively fun it’s sending some users into rehab isn’t keeping those who run it up at night.

“It turns out that scaling a video game isn’t that different than scaling any other successful product,” says Paul Sharpe, principal cloud engineering developer at Epic Games, maker of Fortnite, who shared his story about the move to Kubernetes at the press and analyst briefing at KubeCon + CloudNativeCon North America. “It’s the same sets of challenges.”

Sharpe’s tenure at Epic — a little over a year — coincides with the explosive growth of the game. His previous tours of duty in tech include Twitter, Amazon Web Services and Amazon.

Just like a lot of businesses, he says that modern game development is “actually a whole lot of micro-services,” adding that Epic was already heavily invested in AWS, “all in on public cloud” and employed containerization tech such as Docker. Sharpe describes Epic as a “big Linux shop” where the micro-services (some REST-ful, others not) are written in a number of languages, including Java and Scala.

“Moving to Kubernetes was a natural evolution of our workloads,” Sharpe says. “It basically comes down to trying to improve our developer’s lives.” Right now, the devs have to do a lot manually,  including things like EC2 instances and load balancers. “K8s lets us provide abstractions so they don’t have to deal with that directly and focus on problems they need to solve.” They currently use Amazon Elastic Container Service for Kubernetes (EKS.)

In terms of other cloud native tech, Sharpe says they’re currently working with Prometheus, FluentD, InfluxData and Telegraf. What’s next? As observability becomes a major focus, his team is “very interested” in OpenTracing as well as Jaeger and Zipkin but “haven’t fully decided on that yet.” In terms of metrics, they’re just getting started with Kubernetes, the clusters right now are “pretty small” but Epic is “ramping up into production as we speak.”

“We’re a big game company but we’re small in the amount of resources that we have to manage these kinds of things,” he says.

 

The Linux Foundation provided travel and accommodation.

Cover photo // CC BY NC

The post Containers on a Fortnite scale appeared first on Superuser.

by Nicole Martinelli at December 12, 2018 01:03 AM

December 11, 2018

OpenStack Superuser

Where the cloud native approach is taking NFV architecture for 5G

Network functions virtualization is an exciting technology. By integrating with software-defined networking, NFV offers huge enhancements for cloud service providers (CSPs) in journey to enable 5G for customers.

This year, leading CSPs (Telstra, Deutsche Telekom, AT & T, Verizon, Telefonica) took a great leap forward, launching 5G internet for selected cities. CSPs supported well by leading top vendors (Ericsson, VMware, Nokia, ) who had offered and deployed NFV/SDN. ETSI MEC, Linux Foundation and other communities have provided great support.  However, more work needs to be done to reach to a level where the control of telecom networks stops because 5G internet will offer even more advanced features and fast connectivity to support innovative technologies (autonomous cars, augmented reality, virtual reality, gaming, internet of things, blockchain, etc).

Additionally, CSPs have started evaluating mobile edge computing (MEC) architecture to get closer to digital devices where edge servers provide instant computing and processing of data generated by digital devices. This overall architecture definitely needs end-to-end automation and must have capabilities to support services dedicated for specific spectrum or user cases (network slicing, for example.)

So, 5G is supposed to bring advances to an ever-demanding telecom network where NFV architecture forms the base. But, despite progress of adopting NFV and transforming network to utilize virtualized network functions, there’s work to be done around key development and operations areas that are still questioning the future of agile networks.

The majority of functions and operations are driven and handled with software applications in telecom network with application of NFV architecture. This is really the key factor in progression of NFV, making it possible to control overall network (SDN) and functions with software applications. But still the progress was a bit slower than expected, four years after inception of the NFV concept. However, as containerization has grown, the enterprise has started accumulating a mass cloud-native approach.

Let’s discuss key areas and what progress has been made around cloud-native application methodologies in NFV architecture.

Cloud native VNFs: Containerized, micro-services  and dynamic orchestration

Recently, Nokia released the latest version of their NFV cloud platform CloudBand (CB 19). The focus of this release was to provide an enterprise-ready platform for edge computing deployments. To support edge platforms and end-to-end automation and consistency in updates, CB 19 will leverage containers to host virtual network functions (VNFs) and managed using Kubernetes. Virtual machines, which were typically leveraged to host VNFs, are not completely out of the picture. But now these VMs will be updated more quickly thanks to OpenStack integration in CB 19.

This release represents shows how VNFs deployed in containers is the crucial step to push further innovations and use cases. It enables the cloud-native approach where in monolithic applications are fragmented to create micro-services that can be developed, scale, patched independently and communicate among themselves through rich APIs. Kubernetes has evolved to manage such workloads provides dynamic orchestration for VNFs. Another implementation has already been discussed by the OpenStack community as a proposed feature for an upcoming Tacker release. We can expect more work by NFV vendors and will see integration available in upcoming releases.

Continuous integration/continuous deployment

Industry has rapidly adopted cloud native application development methodologies, which came into existence to continuously upgrade deployed applications and introduce new services quickly. As the number of connected devices grows, each device generates a huge amount of data. Enterprises are building AI-based smart applications to analyze that rich data stream. As Michael Dell recently posted:  Companies will succeed and fail based on their ability to translate data into insights at record speed, which means enabling the movement of relevant data, computing power and algorithms securely and seamlessly across the entire ecosystem. The work, innovation and investment to support that is happening now.”

To keep up with a rapid application development and upgrades, a CI/CD approach is required for VNFs. VNFs not that are not only deployed at central data centers but also at the edge of the network and at the sliced network (for specific use cases but in same network). Based on such architecture network services will be channeled for specific region and specific use cases. And all need to have end to end deployment automation and continuous enhancements and new service launch — as well to stay competitive and in sync with customer demands.

End-to-end continuous monitoring/testing

NFV-based 5G networks will be a complicated mesh that constantly generates data from users. When monitoring these networks, it’ll be a crucial to keep eyes on glitches in network or a possible attacks. In addition, the overall network architecture will be built using solutions and resources provided by diverse vendors, applications and architecture orchestration using variety of frameworks having different workflows. All these factors affect the performance of service delivery to end consumers, which can hamper overall business cases where latency and bandwidth requirements are critical. Work arounds should be expected for building solutions for active testing and monitoring. Such testing solutions might find space in the market in upcoming year and proliferate further as networks transition to become NFV-based.

About the author

Sagar Nangare, a digital strategist at Calsoft Inc., is a marketing professional with over seven years of experience of strategic consulting, content marketing and digital marketing. He’s an expert in technology domains like security, networking, cloud, virtualization, storage and IoT.

Check out the NFV sessions from the recent Berlin Summit.

The post Where the cloud native approach is taking NFV architecture for 5G appeared first on Superuser.

by Sagar Nangare at December 11, 2018 03:01 PM

Trinh Nguyen

Searchlight weekly report - Stein R-19 & R-18



For the last 2 weeks, Stein R-19 and R-18, We're focusing on fixing a bug that fails most of Searchlight's functional tests. And, it's blocking Searchlight because if the tests are not passed, we can not merge anything new.

We can identify the reason that is ElasticSearch instance for the functional tests cannot be started. I'm trying to tune the functional test setup [1] to make it work but seems there is still more work to do. I also work with the people at the OpenStack Infrastructure to see what could be a real problem. Hopefully, we can fix it at the end of this week.

[1] https://review.openstack.org/#/c/622871/

by Trinh Nguyen (noreply@blogger.com) at December 11, 2018 05:29 AM

December 10, 2018

OpenStack Superuser

One to rule them all: OpenStack projects unify mailing lists

This time, it’s not about invisibility. In an effort to increase participation and surface the most relevant conversations and contributions, The OpenStack project mailing lists have merged. That means that the  openstack, openstack-dev, openstack-sigs and openstack-operators mailing lists have been replaced by a new openstack-discuss at lists.openstack.org mailing list.

If you were signed up to the previous lists, you’ll still need to join this one. (The reason: part netiquette, part legal.) The new list is open to all discussions or questions about use, operation or future development of OpenStack.  If you’re unsure about how to tag your topic to make sure your voice is heard, check out the most recent archive of posts and the tagging guidelines here.

What’s behind the effort to combine the lists?

“For one, the above list behavior change to address DMARC/DKIM issues is a good reason to want a new list; making those changes to any of the existing lists is already likely to be disruptive anyway as subscribers may be relying on the subject mangling for purposes of filtering list traffic,” writes Jeremy Stanley, infrastructure engineer for the OSF.  “We have many suspected defunct subscribers who are not bouncing…so this is a good opportunity to clean up the subscriber list and reduce the overall amount of email unnecessarily sent by the server.”

There’s another reason behind the mailing lists coming together, writes Chris Dent. “The hope is to break down some of the artificial and arbitrary boundaries between developers, users, operators, deployers and other ‘stakeholders’ in the community. We need and want to blur the boundaries. Everyone should be using, everyone can be developing.”

Dent, a member of the Technical Committee, longtime contributor and prolific chronicler of the OpenStack community, offers up a few helpful reminders about how to make mailing lists work better.

“You’re trying to make the archive readable for the people who come later. It’s the same as code: you’re not trying to make it maintainable by you. It’s not about you. It’s about other people. Who aren’t there right now.”

The post One to rule them all: OpenStack projects unify mailing lists appeared first on Superuser.

by Superuser at December 10, 2018 02:04 PM

December 07, 2018

OpenStack Superuser

What’s happening now with edge and OpenStack

At the recent Berlin Summit, there was a dedicated track for edge computing with numerous presentations and panel discussions at the conference that were recorded. If you’d like to catch up or see some sessions again, check out visit the OpenStack website for videos.

In parallel to the conference,  the Forum took place with 40-minute-long working sessions for developers, operators and users to meet and discuss new requirements, challenges and pain points to address.

Let’s start with a recap of  the OSF Edge Computing Group and Edge Working Group sessions. (If you’re new to the activities of this group you may want to read my notes on the Denver PTG to catch up on the community’s and the group’s work on defining reference architectures for edge-use cases.)

During the Forum, we continued to discuss the minimum viable product (MVP) architecture that we started at the last PTG. Due to limited time available, we concluded on some basics and agreed on action items to follow up on. Session attendees agreed that the MVP architecture is an important first step and we will keep its scope limited to the current OpenStack services listed on the Wiki capturing the details. Although there’s interest in adding further services such as Ironic or Qinling, they were tabled for later.

The Edge WG is actively working on capturing edge computing use cases in order to understand  the requirements better and to work together with OpenStack and StarlingX projects on design and implementation work based the input the groups has been collecting. We had a session about use cases to identify which are the ones the group should focus on, the most interest was expressed for immediate action with vRAN and edge cloud, uCPE and industrial control.

The group is actively working on the map the MVP architecture options to the use cases identified by the group and to get more details on the ones we identified during the Forum session. If you are interested in participating in these activities please see the details of the group’s weekly meetings.

While the MVP architecture work is focusing on a minimalistic view to provide a reference architecture with the covered services prepared for edge use cases, there’s work ongoing in parallel in several OpenStack projects. (You can find notes on the Forum Etherpads on the progress of projects such as Cinder, Ironic, Kolla-Ansible and TripleO. The general consensus of the project discussions was that the services are in a good shape for edge requirements and there is a clear path forward, for example improving availability zone functionality or remote management of bare metal nodes.

With all the work ongoing in the projects as well as in the Edge WG, the expectation is that we’ll be able to easily move to the next phases with MVP architectures when the working group is ready. Both the group and the projects are looking for contributors for identifying further requirements, use cases or implementation and testing work.

Testing will be a crucial area for edge and we’re looking into both cross-project and cross-community collaborations for with projects such as OPNFV and Akraino.

While we didn’t have a Keystone-specific Forum session for edge this time, a small group came together to discuss next steps with federation. We’re converging towards some generic feature additions to Keystone based on the Athenz plugin from Oath.  (Here’s a Keystone summary from Lance Bragsad that includes plans related to edge.)

We had a couple of sessions at the Summit about StarlingX at the conference part and at the Forum. You can check out videos such as the project update and other relevant sessions of the Summit videos. Because the StarlingX community is working closely with the Edge WG as well as the relevant OpenStack project teams at the Forum, we organized sessions focusing on some specific items for planning future work and increasing understanding requirements for the project.

The team had a session on IoT to talk about the list of devices to consider and the requirements systems need to address in this space. The session also identified a collaboration option between StarlingX, IoTronic and Ironic when it comes to realizing and testing use cases.

With putting more emphasis on containers at the edge the team also had a session on containerized application requirements with a focus on Kubernetes clusters. During the session we talked about areas like container networking, multi-tenancy, persistent storage to see what options we have for them and what’s missing today cover that area. The StarlingX community will be focusing more on containerization for the upcoming releases, so feedback and ideas are important.

One more session to mention is the “Ask me anything about StarlingX” held at the Forum where experts from the community offered help to people who are new and or have questions about the project. The session was well attended and questions focused more on the practical angles like footprint or memory consumption.

Get involved

If you’d like to participate in these activities, you can dial-in to the Edge WG weekly calls or weekly Use cases calls or check the StarlingX sub-project team calls and find further material on the website about how to contribute or jump on IRC for OpenStack project team meetings in the area of your interest.

 

 

Cover photo // CC BY NC

The post What’s happening now with edge and OpenStack appeared first on Superuser.

by Ildiko Vancsa at December 07, 2018 02:02 PM

Chris Dent

Placement Update 18-49

This will be the last placement update of the year. I'll be travelling next Friday and after that we'll be deep in the December lull. I'll catch us up next on January 4th.

Most Important

As last week, progress continues on the work in ansible, puppet/tripleo, kolla, loci to package placement up and establish upgrade processes. All of these things need review (see below). Work on GPU reshaping in virt drivers is getting close.

What's Changed

  • The perfload jobs which used to run in the nova-next job now has its own job, running on each change. This may be of general interest because it runs placement "live" but without devstack. Results in job runs that are less than 4 minutes.

  • We've decided to go ahead with the simple os-resource-classes idea, so a repo is being created.

Slow Reviews

(Reviews which need additional attention because they have unresolved questions.)

  • https://review.openstack.org/#/c/619126/ Set root_provider_id in the database

    This has some indecision because it does a data migration within schema migrations. For this particular case this is safe and quick, but there's concern that it softens a potentially useful boundary between schema and data migrations.

Bugs

Interesting Bugs

(Bugs that are sneaky and interesting and need someone to pick them up.)

  • https://bugs.launchpad.net/nova/+bug/1805858 placement/objects/resource_provider.py missing test coverage for several methods

    This is likely the result of the extraction. Tests in nova's test_servers and friends probably covered some of this stuff, but now we need placement-specific tests.

  • https://bugs.launchpad.net/nova/+bug/1804453 maximum recursion possible while setting aggregates in placement

    This can only happen under very heavy load with a very low number of placement processes, but the code that fails should probably change anyway: it's a potentially infinite loop with no safety breakout.

Specs

Spec freeze is milestone 2, the week of January 7th. There was going to be a spec review sprint next week but it was agreed that people are already sufficiently busy. This will certainly mean that some of these specs do not get accepted for this cycle.

None of the specs listed last week have merged.

Main Themes

Making Nested Useful

Progress continues on gpu-reshaping for libvirt and xen:

Also making use of nested is bandwidth-resource-provider:

Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Extraction

The extraction etherpad is starting to contain more strikethrough text than not. Progress is being made. The main tasks are the reshaper work mentioned above and the work to get deployment tools operating with an extracted placement:

Documentation tuneups:

The functional tests in nova that use extracted placement are working but not yet merged. A child of that patch removes the placement code. Further work will be required to tune up the various pieces of documentation in nova that reference placement.

Other

There are currently only 8 open changes in placement itself. Most of the time critical work is happening elsewhere (notably the deployment tool changes listed above).

Of those placement changes the database-related ones from Tetsuro are the most important.

Outside of placement:

End

In case it hasn't been clear: things being listed here is an explicit invitation (even plea) for you to help out by reviewing or fixing. Thank you.

by Chris Dent at December 07, 2018 12:56 PM

CERN Tech Blog

OpenStack Placement requests - The Uphill and Downhill

Overview In this blogpost I will describe the Placement scalability issue that we identified when OpenStack Nova was upgraded to Rocky and all the work done to mitigate this problem. OpenStack Placement Placement is a service that tracks the inventory and allocations of resource providers (usually compute nodes). The resource tracker runs in each compute node and it sends the inventory and allocations information to Placement through a periodic task.

by CERN (techblog-contact@cern.ch) at December 07, 2018 10:28 AM

Aptira

Software is Eating the Network. Part 2 – Open Source Software

Aprira: Software is Eating the Network. Open Source Software

If we accept that Software is Eating the Network, then we accept that understanding and managing software within an Open Networking solution is critical to successful implementation. We now turn to Open Source Software, a component of Open Networking that is a critical multiplier of the benefits that are derived from the components we already covered, and a critical driver of the success and capabilities of Open Networking. 

Open Source Software has literally exploded into the software marketplace and transformed the approach to Open Networking solutions development. For example, take Github.com, the development platform that hosts a range of software projects including open-source and was recently bought by Microsoft. Github.com was founded in 2008. 

The growth in projects on Github.com is shown in the chart below:  

Aptira: Open Source Software. Github Growth Graph

And the trends in that chart have continued: as at end 2016, Github has 25 million repos, and 67 million at end 2017.  Not all of these are Open Source by any means: in 2016, Github hosted 2.8 million open-source projects that explicitly listed Open source license agreements in the project. 

As a capstone to this massive growth in Open Source, on 4th June, 2018, Microsoft announced that it was buying Github for $US7.5 billion. The CEO of Microsoft, Steve Balmer had once called Linux (and Open Source generally) a “cancer”, in 2001.  Microsoft, a famously proprietary vendor, now finds most of its growth is selling Cloud services to developers, and Open Source is clearly now strategic. 

What produced this massive growth in Open Source? As with networking products, frustration with vendor driven proprietary software. 

Software as commercial product paralleled the development of hardware. Although free sharing of software code was common in the early days of software, commercial pressures sidelined this in the 1970’s and 1980’s. All the commercial software was proprietary: it was owned by the vendor who went to great lengths (often extreme) to prevent that intellectual property being used without compensation. 

Just recently, whilst working with a well-known tertiary institution, I was reminded of how extreme software licence management was in some sectors. This institution was still dealing with an ancient licencing model based on the hardware licence “dongle” that was provided by a software vendor and had to be plugged into the computer on which the software was to be installed & run.  These devices often used up a port (usually the parallel printer port) and were often buggy.  But just the sheer unmanageability of these devices made them a sore point with software users and administrators. 

Vendors argued that their software reduced cost by spreading development investment over many customers, and that a software product based on broad requirements inputs from many customers resulted in a more functional and valuable outcome.   

But it was still only developed by one vendor, and the rate of innovation was directly related to the productivity of that one firm and a function of how much of their revenue the vendor wanted to invest in development and support. Licence costs for software became very expensive, and some software houses very profitable. It was all too easy for the vendor to drift into a “rent seeking” approach of protecting ongoing revenue without necessarily funding ongoing development, which undermined the whole economics of the business model. 

Some thought that it was possible to do better:  

In 1997, Eric Raymond published ‘The Cathedral and the Bazaar’, a reflective analysis of the hacker community and free software principles – (Wikipedia).  

Raymond’s view was that different software development situations demanded different business / development models. In Raymond’s view, some software requirements needed the structure and formality of the Cathedral, but most could be satisfied in a bazaar-like market where products were traded easily amongst a variety of stakeholders. 

The Open Source Initiative was founded in February 1998 to encourage use of the new term and evangelize open-source principles, and in 1998, Netscape released the source code to its Navigator browser as “Free Software”. 

The principles of Open Source software resonate with the philosophies of the Agile movement, which was emerging in parallel: 

  • Users should be treated as co-developers 
  • Early releases 
  • Frequent integration 
  • Several versions 
  • High modularization 
  • Dynamic decision-making structure

Open source was more than just a software licencing mechanism.  Open source established new business models for commercial software enterprises and drove a materially different model for the software development process itself. Numerous companies around the globe have successfully grown business that leveraged open source projects with services, “hardened” versions and support fees. 

What it meant was that a code base could be developed collaboratively by many parties who could all share in the benefits, without a single vendor controlling the pace or direction of the product evolution. 

Since 2011, major companies have got onboard the Open Source bandwagon, developing operationally critical software as open source or releasing in-house developments to open source communities. 

With the brand recognition and financial strength of companies like Google, Facebook and AT&T endorsing the open-source approach, and operationalising business critical functionality using open source software, the Open Source approach became more mainstream. 

This enables many benefits but brings with it many risks and challenges. 

In the next 2 posts we will look at two Open Networking software families that were enabled by the Open Source approach and that in themselves will highlight the many benefits of this approach. 

Stay tuned. 

Let us make your job easier.
Find out how Aptira's managed services can work for you.

Find Out Here

The post Software is Eating the Network. Part 2 – Open Source Software appeared first on Aptira.

by Adam Russell at December 07, 2018 04:40 AM

December 06, 2018

OpenStack Superuser

Contain your excitement: Kata turns one!

A year ago, the Kata Containers project was officially announced at KubeCon + CloudNativeCon in Austin, Texas. On the first anniversary of the Kata project, here’s a highlight of the activities and progress made in the global community.

Community activities

In 2018, the Kata community presented technical updates and hosted gatherings at several global events including KubeCon + CloudNativeCon Austin, KubeCon + CloudNativeCon Copenhagen, OpenStack Summit Vancouver, DockerCon San Francisco, LC3 China, Open Source Summit Vancouver, Container Camp UK, DevSecCon Boston, Open Source Summit & KVM Forum Scotland, OpenStack Summit Berlin and DevOpsCon Munich. Kata has also been featured at several OpenInfra Days, OpenStack Days and other container-focused meetups around the world.

The Kata team at launch.

Watch some of the many Kata presentations:

The Kata Containers project joined the Open Container Initiative (OCI) in March as part of the larger announcement that the OpenStack Foundation became a member of OCI. OCI is an open-source initiative that aims to create industry standards around container formats and runtimes. The OCI specs guarantee that container technology can be open and vendor neutral and become a cornerstone of future computing infrastructure. The Kata Containers community continues to work closely with the OCI and Kubernetes communities to ensure compatibility and regularly tests Kata Containers across AWS, Azure, GCP and OpenStack public cloud environments, as well as across all major Linux distributions.

In July, Clear Linux OS announced support for Kata Containers, allowing users to leverage the benefits of Kata Containers in the Clear Linux OS by easily adding a bundle and optionally setting Kata Containers as their default containers option. Through collaboration with Canonical, we also posted a Kata Containers snap image in the Snap store for use with Ubuntu as well as many other Linux distributions. In Q1 2019, we expect to see Kata in the official openSUSE/SLE repositories.

The first Architecture Committee elections were held in September and the community welcomed Eric Ernst (Intel) and Jon Olson (Google) to join existing members Samuel Ortiz (Intel), Xu Wang (Hyper), and Wei Zhang (Huawei).

In October, the local China community hosted a Kata meetup in Beijing designed for large cloud providers including Alibaba, Baidu, Tencent and more to share adoption plans and feedback for the Kata Containers roadmap. The event, organized by Intel, Hyper and Huawei, shared some early prototypes of Kata Containers on NFV, edge computing, new VMM NEMU and Cloud Containers Instance products. “Kata Containers solves the problem of container isolation by integrating VM and container technologies,” said Bai Yu, senior R&D engineer at Baidu Security. “We apply Kata Containers to DuEdge edge network computing products, which greatly simplifies our resource isolation and container security design in multi-tenant scenarios.”

Development

In May, during the Vancouver OpenStack Summit, Kata landed its 1.0 release. This first release completed the merger of Intel’s Clear Containers and Hyper’s runV technologies and delivered an OCI compatible runtime with seamless integration for container ecosystem technologies like Docker and Kubernetes.

Since September, the project delivered several releases which introduced features to help cloud service providers (CSPs) and others who plan to deploy Kata into production. The 1.3.0 and 1.2.2 stable releases featured Network and Memory hotplug in order to better support CSP customers’ running production environments. The community also continued its pursuit of cross-architectural design by adding more support for ARM64 as well as Intel(R) Graphics Virtualization.

Most recently the project has made some rapid advancements with the 1.4.0 release which offers better logging, ipvlan/macvlan support through TC mirroring, and NEMU hypervisor support. View the full list of 1.4 features in this blog post.

The 1.5 release is currently planned for mid January 2019, and will offer support for containerd v2 shim among other features.

Since its launch, Kata Containers has scaled to include support for major architectures, include AMD64, ARM and IBM p-series. The Kata community is fortunate to have input from some of the world’s largest cloud service providers, operating system vendors and telecom equipment makers such that the project can better serve their internal infrastructure needs as well as enterprise customers.

Up next

The Kata Containers community will again be at KubeCon + CloudNativeCon next week to talk about the latest v1.4 release and use cases for operating secure containers. Kata will have a booth presence to provide more visibility for the project and a central location for outreach, education and collaborative discussions. If you’re attending KubeCon, come join us at booth S17 in the expo. Xu Wang of Hyper.sh, Eric Ernst of Intel, and Jon Olson of Google—members of the Architecture Committee—will join other community leaders at the event.

Kata Containers will be featured in nearly a dozen sessions, notably:

Looking ahead to next year, the Kata community will hold two open meetings on December 17, 2018 to discuss community building, advocacy, events and marketing plans for 2019. Anyone is welcome to participate. Meeting details can be found here.
Kata Containers is a fully open-source project––check out Kata Containers on GitHub and join the channels below to find out how you can contribute.

As we celebrate one year, Kata community leaders and contributors reflect on the project’s growth and impact.

“It’s been great working on Kata over the past year. I love learning, and with this project, we touch so many technical domains while living in a very dynamic ecosystem.  Even an older, mature technology like virtualization is super exciting. For me, it’s the input and contributions from the community which make the project great. Looking forward to scaling further in 2019!”

— Eric Ernst (@egernst), Member of Kata Containers Architecture Committee

 

“My Kata journey began with a marathon-like series meetings, in which we decided to merge runV and Clear Containers, in September 2017 in San Francisco, Denver, and Portland. Then we had impressive preparing meetings in Austin/Oct, Sydney/Nov, and again Austin/Dec, and announced Kata Containers together with Intel and OpenStack Foundation. And this year, in Copenhagen,  Vancouver, Beijing, Taipei, Berlin, and soon Seattle, it is Kata containers that changed our company’s and my track. Thanks for the tremendous efforts from foundation and Intel, Huawei, Google, ARM, IBM, ZTE, etc., the community has made the project become much stabler and faster, and I believe the project will become even better in the next year.”

— Xu Wang (@gnawux), Member of Kata Containers Architecture Committee

“Long long time ago, about 2 years before Kata was born, I noticed the announcement of Intel Clear Containers(CC) and Hyper runV projects, and had been attracted by the idea at once. It solves some real problems for containers, in a smart and practical way. Since then,  I joined the community and worked a lot together with Hyper and Intel. The merge of CC and runV was another exciting news as it consolidated efforts from two communities. Kata containers keeps evolving quickly over the past year, lots of new features were added, code quality and stability are better and better, more people from more companies are joining. This is the charm of open source, this is the power of open source, and this is the most interesting and coolest thing I have done in my past years! Happy birthday, Kata!”

— Wei Zhang (@WeiZhang555), Member of Kata Containers Architecture Committee

 

“It has been exciting as a member of the Kata Community and observing all the exciting changes over the last year. After experiencing the significant adoption of containers in the industry over the last few years, I was instantly captivated by Kata Containers and the idea of running containers in a VM to make it easier to sandbox container workloads. Then, the recent announcement from Amazon to open source their Firecracker micro VM project continues to ignite the excitement in the community.  I am looking forward to noticing more Kata Containers running in more production environments.”

— Rico Aravena (@raravena80), Kata Containers Contributor

 

 

 

 

 

 

 

Cover image: Via Lego.com

The post Contain your excitement: Kata turns one! appeared first on Superuser.

by Claire Massey at December 06, 2018 02:02 PM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
January 19, 2019 06:37 AM
All times are UTC.

Powered by:
Planet