December 14, 2018

Emilien Macchi

OpenStack Containerization with Podman – Part 4 (Healthchecks)

For this fourth episode, we’ll explain how we implemented healthchecks for Podman containers. Don’t miss the first, second and third episodes where we learnt how to deploy, operate and upgrade Podman containers.

In this post, we’ll see the work that we have done to implement container healthchecks with Podman.

Note: Jill Rouleau wrote the code in TripleO to make that happen.

Context

Docker can perform health checks directly in the Docker engine without the need of an external monitoring tool or sidecar containers.

A script (usually per-image) would be run by the engine and the return code would define if whether or not a container is healthy.

Example of healthcheck script:

curl -g -k -q --fail --max-time 10 --user-agent curl-healthcheck \
--write-out "\n%{http_code} %{remote_ip}:%{remote_port} %{time_total} seconds\n" https://my-app:8774 || return 1

It was originally built so unhealthy containers can be rescheduled or removed by Docker engine. The health could be verified by docker ps or docker inspect commands:

$ docker ps
my-app "/entrypoint.sh" 30 seconds ago Up 29 seconds (healthy) 8774/tcp my-app

However with Podman we don’t have that kind of engine anymore but having that monitoring interface has been useful in our architecture, so our operators can use this interface to verify the state of the containers.

Several options were available to us:

  • systemd timers (like cron) to schedule the health checks. Example documented on coreos manuals.
  • use Podman Pods with Side Car container running health checks.
  • add a scheduling function to conmon with a scheduling function.
  • systemd service, like a podman-healthcheck service that would run on a fixed interval.

If you remember from the previous posts, we decided to get some help from systemd to control the containers, like automatically restart on failure and also automatic start at boot. With that said, we decided to go with the first option, which seems the easier to integrate and the less invasive.

Implementation

The systemd timer is a well-known mechanism. It’s basically a native feature in systemd that allows to run a specific service in a time controlled. The service would be a “OneShot” type, executing the healthcheck script present in the container image.

Here is how we did for our OpenStack containers (with a timer of 30 seconds for healthchecks, configurable like a cron):

# my_app_healthcheck.timer

[Unit]
Description=my_app container healthcheck
Requires=my_app_healthcheck.service
[Timer]
OnUnitActiveSec=90
OnCalendar=*-*-* *:*:00/30
[Install]
WantedBy=timers.target
# my_app_healthcheck.service

[Unit]
Description=my_app healthcheck
Requisite=my_app.service
[Service]
Type=oneshot
ExecStart=/usr/bin/podman exec my_app /bin/healthcheck
[Install]
WantedBy=multi-user.target

Activate the timer and service:

$ systemctl daemon-reload
$ systemctl enable --now my_app_healthcheck.service
$ systemctl enable --now my_app_healthcheck.timer

Check the service & timer status:

$ service my_app_healthcheck status
Redirecting to /bin/systemctl status my_app_healthcheck.service
● my_app_healthcheck.service - my_app healthcheck
   Loaded: loaded (/etc/systemd/system/my_app_healthcheck.service; enabled; vendor preset: disabled)
   Active: activating (start) since Fri 2018-12-14 20:11:00 UTC; 158ms ago
 Main PID: 325504 (podman)
   CGroup: /system.slice/my_app_healthcheck.service
           └─325504 /usr/bin/podman exec my_app /bin/healthcheck
Dec 14 20:11:00 myhost.localdomain systemd[1]: Starting my_app healthcheck...

$ service my_app_healthcheck.timer status
Redirecting to /bin/systemctl status my_app_healthcheck.timer
● my_app_healthcheck.timer - my_app container healthcheck
   Loaded: loaded (/etc/systemd/system/my_app_healthcheck.timer; enabled; vendor preset: disabled)
   Active: active (waiting) since Fri 2018-12-14 18:42:22 UTC; 1h 30min ago
Dec 14 18:42:22 myhost.localdomain systemd[1]: Started my_app container healthcheck.

$ systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                                               ACTIVATES
Fri 2018-12-14 20:14:00 UTC  361ms left    Fri 2018-12-14 20:13:30 UTC  29s ago      my_app_healthcheck.timer               my_app_healthcheck.service

Now it’s implemented, let’s try it!

Demo

Stay in touch for the next post in the series of deploying TripleO and Podman!

Source of the demo.

by Emilien at December 14, 2018 09:46 PM

Ben Nemec

Openstack Virtual Baremetal 2.0 Update

As mentioned in a previous update, OVB 2.0 is coming. This update to is to notify everyone that a development branch is available in the repo and discuss some of the changes made so far.

First, here's the the 2.0-dev branch. It currently contains most, if not all, of the 2.0 changes that I have planned, so it can be used for testing. Please do test it if you depend on OVB for anything. I believe the migration should be fairly painless if you've been keeping up with the recommended deployment methods, but if you do find any problems let me know as soon as possible.

Here's a general overview of the changes for OVB 2.0:

  • Routed networks. 2.0 is based on the routed-networks branch so it supports all of the functionality added as part of that work.
  • Using the parameters section of a Heat environment is no longer supported. Everything must be in parameter_defaults now.
  • port-security is used by default. This requires at minimum a Mitaka OpenStack cloud, although most of my testing has been Newton or higher. Symlinks have been left for the old port-security environments to ease migration. This also means it is no longer recommended to use the Neutron noop firewall driver.
  • Most, if not all, deprecated features and options have been removed.
  • The BMC now signals back to Heat whether it has succeeded or failed to bring up the BMC services. This should allow failures to be caught much earlier in the deploy process.
  • The BMC install script now uses only OpenStackClient, rather than the service-specific clients. Note that this requires an updated BMC image. Existing BMC images can still be used, but they will have to install the openstackclient package. It is recommended that you pull down the new image so no external resources are needed for BMC deployment. Note that the new BMC image is now the default one for download, but if you need the old one for some reason it is still available too.
  • Some (mostly internal) interfaces have changed. As a result, if you have custom environments or templates it is possible that they will need updating for use with 2.0. If you're only using the default templates and environments shipped with OVB they should continue to work.

EDIT I forgot one other thing. There's an RFE for making the BMC installation more robust that isn't yet in 2.0. I have a GitHub issue open to track it. It doesn't involve any breaking changes though so it doesn't have to block 2.0 from going live (in fact, it could be done in 1.0 too). /EDIT

I think that covers the highlights. If you have any questions or concerns about these changes don't hesitate to contact me, either through the GitHub repo or #tripleo on Freenode. Thanks.

by bnemec at December 14, 2018 06:12 PM

OpenStack Superuser

Why evolving tech is powered by new, diverse contributors

“Ba bump. Ba bump. Do you hear it? That’s the heartbeat of the data center, powered by OpenStack. That is what you folks have built. We have so much strength in this community. You’ve worked so hard to enable deployable, stable, mature, secure software that enables private and hybrid cloud models to flourish across the ecosystem,” said Melissa Evers-Hood, director of edge and cloud orchestration stacks at Intel as she spoke to the audience at the recent OpenStack Summit in Berlin. This strength is due, in part, to the amazing diversity within the OpenStack community, which was great to see throughout the week.

As OpenStack evolves to support open infrastructure—highlighted by Mark Collier through pilot projects Airship, Kata Containers, StarlingX and Zuul—and as exciting new use cases emerge, the ability to attract and retain diverse talent, and for us to collaborate across many different projects and communities, has never been more important.

It was great to see Joseph Sandoval talk about the importance of new contributors, and of mentors and mentorship programs as core to sustaining our community in his keynote. “OpenStack is a strategic platform that I believe will enable diversity … “I’ve been helping individuals … bringing them along and pointing them in the direction of open source projects where they can learn and find their place within those communities, and giving them the technical acumen so they can succeed and find their way.” He reminded us that, “Mentorship comes in many forms … Show up and support these programs. These programs need you.”

The mentorship theme was carried through a speed mentoring luncheon, facilitated by Amy Marrich and Nicole Huesman. The workshop, which has become a mainstay at the summits, attracted a nice turnout and supported rich engagement between mentors and mentees across career, community and technical tracks.

The diversity luncheon, hosted by the Diversity Working Group and sponsored by Intel, opened its doors for the first time to a more diverse audience. This evolution seemed to parallel that of OpenStack itself, and it was wonderful to see full representation—across women, men and underrepresented minorities—to recognize and celebrate the diversity within our community. Melissa Evers-Hood from Intel welcomed guests and Madhuri Kumari, cloud software engineer at Intel, offered her personal experiences of a young, new contributor in the OpenStack community. Joseph Sandoval then offered his insights about the importance of male allies and advocates, as well as mentoring, to build more diverse, inclusive communities.

Ell Marquez, technical evangelist at Linux Academy, led a mentoring panel discussion on the last day of the event. The panel discussion helped facilitate a robust discussion about mentorship programs, with two points of clarity. It’s clear that there’s deep interest in mentorship programs. While the OpenStack community has several programs and resources for new contributors, greater awareness and education is needed around how to tap into them—a great problem to have!

Panelists included:

  • Amy Marrich, OpenStack User Committee Member and Diversity Working Group chair, and OpenStack course architect at Linux Academy
  • Nicole Huesman, community and developer advocate, Intel
  • Jill Rouleau, senior software engineer, Red Hat
  • Daniel Izquierdo, co-founder, Bitergia

The week in Berlin was an inspiring testament to the progress and momentum of the diversity within the OpenStack community. As OpenStack takes shape as the foundation for the open infrastructure and new projects and technologies emerge to tackle the challenges of IoT, edge and other exciting use cases, we’ll continue to strive for greater diversity within our community and welcome new contributors of all sizes, shapes and colors into our thriving community.

Stay tuned for how to get involved in upcoming events for the Open Infrastructure Summit in 2019.

The post Why evolving tech is powered by new, diverse contributors appeared first on Superuser.

by Superuser at December 14, 2018 03:19 PM

December 13, 2018

OpenStack Superuser

How to use Alexa to keep an eye on your virtual machines

With about 100 lines of Python code, Amazon’s virtual assistant Alexa check up on your data center.

“Bear with me, I’m not a programmer” says Sebastian Wenner of T-mobile before offering up his proof-of-concept of how to connect Alexa with the OpenTelekom Cloud at the recent OpenStack Summit Berlin.

He showed participants what components are needed, how to interact with Alexa and the Alexa Skills Kit (ASK) and demonstrated a few functions, too.  To create the POC, he “just applied some common sense here and some historic knowledge from my studies in the last millennium.”

Why use Alexa? She’s already a fixture in many homes — playing music, controlling smart devices, news and weather — and the device is only expected to become more popular. By 2022, more than half of American homes will have a smart speaker, according to Juniper Research.

Here’s what you’ll need to get started:


And then you need to learn how to talk to Amazon’s cloud-based voice service. The first thing that you need to know is something called an “utterance” so something that you say, Wenner notes. You define a string like “status report” that’s heard by the device, then that utterance gets translated to an intent and the intent will be called to  trigger an action.

Wenner’s example?

“Alexa start OTC control center.

Alexa: “OTC control center is online.”

“How many VMs are running inside my tenant?”

Alexa: “The total number of virtual machines in your tenant is 52 at the moment, 39 are running, 13 are shut down.”

“Shut down.”

Alexa: “OTC center shutting down.”

This isn’t the first time a Stacker has made a connection with Alexa. In 2017, Jaesuk Ahn and Seungkyu Ahn kicked off OpenStack Days Korea with a live demo asking Amazon’s Alexa AI to deploy OpenStack in three minutes using containers. (Check out that demo here.)

See Wenner show off his handiwork in the 9-minute video below and download the slides with details on the architecture here or more on GitHub.

Cover photo // CC BY NC

The post How to use Alexa to keep an eye on your virtual machines appeared first on Superuser.

by Superuser at December 13, 2018 04:58 PM

December 12, 2018

SWITCH Cloud Blog

Hack Neutron to add more IP addresses to an existing subnet

When we designed our OpenStack cloud at SWITCH, we created a network in the service tenant, and we called it private.

This network is shared with all tenants and it is the default choice when you start a new instance. The name private comes from the fact that you will get a private IP via dhcp. The subnet we choosed for this network is the 10.0.0.0/24. The allocation pool goes from 10.0.0.2 to 10.0.0.254 and it can’t be enlarged anymore. This is a problem because we need IP addresses for many more instances.

In this article we explain how we successfully enlarged this subnet to a wider range: 10.0.0.0/16. This operation is not a feature supported by Neutron in Juno, so we show how to hack into Neutron internals. We were able to successfully enlarge the subnet and modify the allocation pool, without interrupting the service for the existing instances.

In the following we assume that the network we are talking about has only 1 router, however this procedure can be easily extended to more complex setups.

What you should know about Neutron, is that a Neutron network has two important namespaces in the OpenStack network node.

  • The qrouter is the router namespace. In our setup one interface is attached to the private network we need to enlarge and a second interface is attached to the external physical network.
  • The qdhcp name space has only 1 interface to the private network. On your OpenStack network node you will find that a dnsmasq process is running bound to this interface to provide IP addresses via DHCP.
Neutron Architecture

Neutron Architecture

In the figure Neutron Architecture we try to give an overview of the overall system. A Virtual Machine (VM) can run on any remote compute node. The compute node has a Open vSwitch process running, that collects the traffic from the VM and with proper VXLAN encapsulation delivers the traffic to the network node. The Open vSwitch at the network node has a bridge containing both the qrouter namespace internal interface and the qdhcp namespace, this will make the VMs see both the default gateway and the DHCP server on the virtual L2 network. The qrouter namespace has a second interface to the external network.

Step 1: hack the Neutron database

In the Neutron database look for the subnet, you can easily find your subnet in the table matching the service tenant id:

select * from subnets WHERE tenant_id='d447c836b6934dfab41a03f1ff96d879';

Take note of id (that in this table is the subnet_id) and network_id of the subnet. In our example we had these values:

id (subnet_id) = 2e06c039-b715-4020-b609-779954fa4399
network_id = 1dc116e9-1ec9-49f6-9d92-4483edfefc9c
tenant_id = d447c836b6934dfab41a03f1ff96d879

Now let’s look into the routers database table:

select * from routers WHERE tenant_id='d447c836b6934dfab41a03f1ff96d879';

Again filter for the service tenant. We take note of the router ID.

 id (router_id) = aba1e526-05ca-4aca-9a80-01601cdee79d

At this point we have all the information we need to enlarge the subnet in the Neutron database.

update subnets set cidr='NET/MASK' WHERE id='subnet_id';

So in our example:

update subnets set cidr='10.0.0.0/16' WHERE id='2e06c039-b715-4020-b609-779954fa4399';

Nothing will happen immediately after you update the values in the Neutron mysql database. You could reboot your network node and Neutron would rebuild the virtual routers with the new database values. However, we show a better solution to avoid downtime.

Step 2: Update the interface of the qrouter namespace

On the network node there is a namespace qrouter-<router_id> . Let’s have a look at the interfaces using iproute2:

sudo ip netns exec qrouter-(router_id) ip addr show

With the values in our example:

sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr show

You will see the typical Linux output with all the interfaces that live in this namespace. Take note of the interface name with the address 10.0.0.1/24 that we want to change, in our case

 qr-396e87de-4b

Now that we know the interface name we can change IP address and mask:

sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr add 10.0.0.1/16 dev qr-396e87de-4b
sudo ip netns exec qrouter-aba1e526-05ca-4aca-9a80-01601cdee79d ip addr del 10.0.0.1/24 dev qr-396e87de-4b

Step 3: Update the interface of the qdhcp namespace

Still on the network node there is a namespace qdhcp-<network_id>. Exactly in the same way we did for the qrouter namespace we are going to find the interface name, and change the IP address with the updated netmask.

sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr add 10.0.0.2/16 dev tapadebc2ff-10
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr del 10.0.0.2/24 dev tapadebc2ff-10
sudo ip netns exec qdhcp-1dc116e9-1ec9-49f6-9d92-4483edfefc9c ip addr show

The dnsmasq process running bounded to the interface in the qdhcp namespace, is smart enough to detect automatically the change in the interface configuration. This means that the new instances at this point will get via DHCP a /16 netmask.

Step 4: (Optional) Adjust the subnet name in Horizon

We called the subnet name 10.0.0.0/24. For pure cosmetic we logged in the Horizon web interface as admin and changed the name of the subnet to 10.0.0.0/16.

Step 5: Adjust the allocation pool for the subnet

Now that the subnet is wider, the neutron client will let you configure a wider allocation pool. First check the existing allocation pool:

$ neutron subnet-list | grep 2e06c039-b715-4020-b609-779954fa4399

| 2e06c039-b715-4020-b609-779954fa4399 | 10.0.0.0/16     | 10.0.0.0/16      | {"start": "10.0.0.2", "end": "10.0.0.254"}           |

You can resize easily the allocation pool like this:

neutron subnet-update 2e06c039-b715-4020-b609-779954fa4399 --allocation-pool start='10.0.0.2',end='10.0.255.254'

Step 6: Check status of the VMs

At this point the new instances will get an IP address from the new allocation pool.

As for the existing instances, they will continue to work with the /24 address mask. In case of reboot they will get via DHCP the same IP address but with the new address mask. Also, when the DHCP lease expires, depending on the DHCP client implementation, they will hopefully get the updated netmask. This is not the case with the default Ubuntu dhclient, that will not refresh the netmask when the IP address offered by the DHCP server does not change.

The worst case scenario is when the machine keeps the old /24 address mask for a long time. The outbound traffic to other machines in the private network might experience a suboptimal routing through the network node, that will be used as a default gateway.

Conclusion

We successfully expanded a Neutron network to a wider IP range without service interruption. Understanding Neutron internals it is possible to make changes that go beyond the features of Neutron. It is very important to understand how the values in the Neutron database are used to create the network namespaces.

We understood that a better design for our cloud would be to have a default Neutron network per tenant, instead of a shared default network for all tenants.

by Saverio Proto at December 12, 2018 02:14 PM

Aptira

Software is Eating the Network. Part 3 – Software Defined Networking (SDN)

In this and the next post, we’ll be covering the last two components of the Open Networking Open Network Software domain. This post is about Software Defined Networking, and the next post will be on Network Functions Virtualisation. 

By the mid-2000’s, as massive demand for network services drove enterprises to build larger and larger networks, a paradox emerged: Product innovation at the component level was still heavily dominated by proprietary vendor architectures and at the same time, scaling and operationalising networks at the levels being driven by corporations like Google and Amazon, and even the US Government, was problematic – and expensive. 

Once you bought a piece of networking hardware, you didn't really have the freedom to re-program it. … You would buy a router from Cisco and it would come with whatever protocols it supported and that’s what you ran.

Scott Shenker, UC Berkeley computer science professor and former Xerox PARC researcherhttps://www.wired.com/2012/04/nicira/

At one level, there is good reason for this: operational stability.  

If you buy switches from a company and you expect them to work. A networking company doesn't want to give you access and have you come running to them when your network melts down because of something you did.

Scott Shenker

On the other hand, this characteristic gave the network vendors enormous control and leverage over their marketplaces. Companies that bought their products were highly dependent on these vendors for both for feature innovation and also to address operational issues. 

A small number of companies sought to address these dependencies. In 2005, Google started to build its own networking hardware, in part because it needed more control over how the hardware operated. Again in 2010, Google was building its own networking hardware for its “G-Scale Network”. But Google was less interested in building its own network hardware than it was in driving the softwar-isation of the network stack. 

It’s not hard to build networking hardware. What’s hard is to build the software itself as well.

Urs HölzleGoogle’s SVP of Infrastructure

In 2008, Google had deployed an internally developed Load Balancer called Maglev which was entirely based on software running on commodity hardware. 

Although Google was quite capable of developing software components internally, when it came to G-Scale Network they would turn to an outside development and change the balance between hardware and software forever. 

In 2003 Martin Casado, while studying at Stanford, began to develop OpenFlow, software that enabled a new type of network that exists only as software and that you can control independently of the physical switches and routers running beneath it. Casado wanted a network that can be programmed like a general-purpose computer and that could work with any networking hardware. 

Anyone can buy a bunch of computers and throw a bunch of software engineers at them and come up with something awesome, and I think you should be able to do the same with the network. We've come up with a network architecture that lets you have the flexibility you have with computers, and it works with any networking hardware.

Martin Casadohttps://www.wired.com/2012/04/nicira/

This approach is called Software Defined Networking (SDN). With SDN, the network hardware became less relevant.  Large users were able either to develop software solutions on commodity hardware or go direct to manufacturers in Asia (who also supplied big network vendors) and buy the basic network hardware directly. 

Software Defined Networking purposefully combines the flexibility of software development with the raw power of network devices to produce an intelligent network fabric. 

In 2011, the Open Networking Foundation (ONF) was founded to transfer control of OpenFlow to a not-for-profit organization, and the ONF released version 1.1 of the OpenFlow protocol on 28 February 2011. Interest boomed and prompted new product lines from vendor startups offering SDN capabilities on open hardware-based equipment. 

Google rolled out its “G-Scale Network“, entirely based on OpenFlow by 2012, and many of the top-ranked internet companies implemented SDN-based networks in parallel or soon after. 

Based on SDN, software now plays a controlling part in the open network and enables a new set of applications to be built that leverage this network fabric. With the rollout of SDN as a major driver of network design, deployment, operation and evolution, we see the requirements for success changing. Successful networks are now less about bolting together nodes and links with (relatively) pre-defined characteristics.  Successful networks now start to take on aspects of the Software Development Lifecycle (SDLC).  This change has huge implications for enterprises of all sizes because it completely changes the fundamental paradigm of how distributed computing resources are deployed. 

Hot on the heels of SDN came another key technology within Open Networking, i.e. Network Functions Virtualisation (NFV). NFV only further entrenches the software-isation of computing infrastructure and the need for a ‘software first’ paradigm. 

We’ll see more about this in our post.  Stay tuned. 

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post Software is Eating the Network. Part 3 – Software Defined Networking (SDN) appeared first on Aptira.

by Adam Russell at December 12, 2018 10:30 AM

OpenStack Superuser

Containers on a Fortnite scale

SEATTLE — You might think that the only thing more exhilarating than playing Fortnite is being responsible for uptime on a game with 200 million registered users that raked in $300 million in revenue in a single month.

You’d be wrong. The game credited with being so addictively fun it’s sending some users into rehab isn’t keeping those who run it up at night.

“It turns out that scaling a video game isn’t that different than scaling any other successful product,” says Paul Sharpe, principal cloud engineering developer at Epic Games, maker of Fortnite, who shared his story about the move to Kubernetes at the press and analyst briefing at Kubecon. “It’s the same sets of challenges.”

Sharpe’s tenure at Epic — a little over a year — coincides with the explosive growth of the game. His previous tours of duty in tech include Twitter, Amazon Web Services and Amazon.

Just like a lot of businesses, he says that modern game development is “actually a whole lot of micro-services,” adding that Epic was already heavily invested in AWS, “all in on public cloud” and employed containerization tech such as Docker. Sharpe describes Epic as a “big Linux shop” where the micro-services (some REST-ful, others not) are written in a number of languages, including Java and Scala.

“Moving to Kubernetes was a natural evolution of our workloads,” Sharpe says. “It basically comes down to trying to improve our developer’s lives.” Right now, the devs have to do a lot manually,  including things like EC2 instances and load balancers. “K8s lets us provide abstractions so they don’t have to deal with that directly and focus on problems they need to solve.” They currently use Amazon Elastic Container Service for Kubernetes (EKS.)

In terms of other cloud native tech, Sharpe says they’re currently working with Prometheus, FluentD, InfluxData and Telegraf. What’s next? As observability becomes a major focus, his team is “very interested” in OpenTracing as well as Jaeger and Zipkin but “haven’t fully decided on that yet.” In terms of metrics, they’re just getting started with Kubernetes, the clusters right now are “pretty small” but Epic is “ramping up into production as we speak.”

“We’re a big game company but we’re small in the amount of resources that we have to manage these kinds of things,” he says.

 

The Linux Foundation provided travel and accommodation.

Cover photo // CC BY NC

The post Containers on a Fortnite scale appeared first on Superuser.

by Nicole Martinelli at December 12, 2018 01:03 AM

December 11, 2018

OpenStack Superuser

Where the cloud native approach is taking NFV architecture for 5G

Network functions virtualization is an exciting technology. By integrating with software-defined networking, NFV offers huge enhancements for cloud service providers (CSPs) in journey to enable 5G for customers.

This year, leading CSPs (Telstra, Deutsche Telekom, AT & T, Verizon, Telefonica) took a great leap forward, launching 5G internet for selected cities. CSPs supported well by leading top vendors (Ericsson, VMware, Nokia, ) who had offered and deployed NFV/SDN. ETSI MEC, Linux Foundation and other communities have provided great support.  However, more work needs to be done to reach to a level where the control of telecom networks stops because 5G internet will offer even more advanced features and fast connectivity to support innovative technologies (autonomous cars, augmented reality, virtual reality, gaming, internet of things, blockchain, etc).

Additionally, CSPs have started evaluating mobile edge computing (MEC) architecture to get closer to digital devices where edge servers provide instant computing and processing of data generated by digital devices. This overall architecture definitely needs end-to-end automation and must have capabilities to support services dedicated for specific spectrum or user cases (network slicing, for example.)

So, 5G is supposed to bring advances to an ever-demanding telecom network where NFV architecture forms the base. But, despite progress of adopting NFV and transforming network to utilize virtualized network functions, there’s work to be done around key development and operations areas that are still questioning the future of agile networks.

The majority of functions and operations are driven and handled with software applications in telecom network with application of NFV architecture. This is really the key factor in progression of NFV, making it possible to control overall network (SDN) and functions with software applications. But still the progress was a bit slower than expected, four years after inception of the NFV concept. However, as containerization has grown, the enterprise has started accumulating a mass cloud-native approach.

Let’s discuss key areas and what progress has been made around cloud-native application methodologies in NFV architecture.

Cloud native VNFs: Containerized, micro-services  and dynamic orchestration

Recently, Nokia released the latest version of their NFV cloud platform CloudBand (CB 19). The focus of this release was to provide an enterprise-ready platform for edge computing deployments. To support edge platforms and end-to-end automation and consistency in updates, CB 19 will leverage containers to host virtual network functions (VNFs) and managed using Kubernetes. Virtual machines, which were typically leveraged to host VNFs, are not completely out of the picture. But now these VMs will be updated more quickly thanks to OpenStack integration in CB 19.

This release represents shows how VNFs deployed in containers is the crucial step to push further innovations and use cases. It enables the cloud-native approach where in monolithic applications are fragmented to create micro-services that can be developed, scale, patched independently and communicate among themselves through rich APIs. Kubernetes has evolved to manage such workloads provides dynamic orchestration for VNFs. Another implementation has already been discussed by the OpenStack community as a proposed feature for an upcoming Tacker release. We can expect more work by NFV vendors and will see integration available in upcoming releases.

Continuous integration/continuous deployment

Industry has rapidly adopted cloud native application development methodologies, which came into existence to continuously upgrade deployed applications and introduce new services quickly. As the number of connected devices grows, each device generates a huge amount of data. Enterprises are building AI-based smart applications to analyze that rich data stream. As Michael Dell recently posted:  Companies will succeed and fail based on their ability to translate data into insights at record speed, which means enabling the movement of relevant data, computing power and algorithms securely and seamlessly across the entire ecosystem. The work, innovation and investment to support that is happening now.”

To keep up with a rapid application development and upgrades, a CI/CD approach is required for VNFs. VNFs not that are not only deployed at central data centers but also at the edge of the network and at the sliced network (for specific use cases but in same network). Based on such architecture network services will be channeled for specific region and specific use cases. And all need to have end to end deployment automation and continuous enhancements and new service launch — as well to stay competitive and in sync with customer demands.

End-to-end continuous monitoring/testing

NFV-based 5G networks will be a complicated mesh that constantly generates data from users. When monitoring these networks, it’ll be a crucial to keep eyes on glitches in network or a possible attacks. In addition, the overall network architecture will be built using solutions and resources provided by diverse vendors, applications and architecture orchestration using variety of frameworks having different workflows. All these factors affect the performance of service delivery to end consumers, which can hamper overall business cases where latency and bandwidth requirements are critical. Work arounds should be expected for building solutions for active testing and monitoring. Such testing solutions might find space in the market in upcoming year and proliferate further as networks transition to become NFV-based.

About the author

Sagar Nangare, a digital strategist at Calsoft Inc., is a marketing professional with over seven years of experience of strategic consulting, content marketing and digital marketing. He’s an expert in technology domains like security, networking, cloud, virtualization, storage and IoT.

Check out the NFV sessions from the recent Berlin Summit.

The post Where the cloud native approach is taking NFV architecture for 5G appeared first on Superuser.

by Sagar Nangare at December 11, 2018 03:01 PM

Trinh Nguyen

Searchlight weekly report - Stein R-19 & R-18



For the last 2 weeks, Stein R-19 and R-18, We're focusing on fixing a bug that fails most of Searchlight's functional tests. And, it's blocking Searchlight because if the tests are not passed, we can not merge anything new.

We can identify the reason that is ElasticSearch instance for the functional tests cannot be started. I'm trying to tune the functional test setup [1] to make it work but seems there is still more work to do. I also work with the people at the OpenStack Infrastructure to see what could be a real problem. Hopefully, we can fix it at the end of this week.

[1] https://review.openstack.org/#/c/622871/

by Trinh Nguyen (noreply@blogger.com) at December 11, 2018 05:29 AM

Viet OpenStack (now renamed Viet OpenInfa) second webinar 10 Dec. 2018



Yes, we did it, the second OpenStack Upstream Contribution webinar. This time we focused on debugging tips and tricks for first-time developers. We also had time to introduce some of the great tools such as Zuul CI [1] (and how to use the Zuul status page [2] to keep track of running tasks), ARA report [3], and tox [4] etc. During the session, attendees had shared some great experience when debugging OpenStack projects (e.g., how to read logs, use ide, etc.). And,  a lot of good questions has been raised such as how to use ipdb [7] to debug running services (using ipdb to debug is quite hardcore I think :)) etc. You can check out this GitHub link [5] for chat logs and other materials.

I want to say thanks to all the people at the Jitsi open source project [6] that provides a great conferencing platform for us. We were able to have video discussion smoothly without any limitation or interruption and the experience was so great.

Watch the recorded video here: https://youtu.be/rI2zPQYtX-g




References:

[1] https://zuul-ci.org/
[2] http://zuul.openstack.org/status
[3] http://logs.openstack.org/78/570078/10/check/neutron-grenade-multinode/303521d/ara-report/reports/
[4] https://tox.readthedocs.io/en/latest/
[5] https://github.com/dangtrinhnt/vietstack-webinars/tree/master/second
[6] https://jitsi.org/
[7] https://pypi.org/project/ipdb/

by Trinh Nguyen (noreply@blogger.com) at December 11, 2018 02:12 AM

December 10, 2018

OpenStack Superuser

One to rule them all: The OpenStack Foundation unifies mailing lists

This time, it’s not about invisibility. In an effort to increase participation and surface the most relevant conversations and contributions, The OpenStack Foundation mailing lists have merged. That means that the  openstack, openstack-dev, openstack-sigs and openstack-operators mailing lists have been replaced by a new openstack-discuss at lists.openstack.org mailing list.

If you were signed up to the previous lists, you’ll still need to join this one. (The reason: part netiquette, part legal.) The new list is open to all discussions or questions about use, operation or future development of OpenStack.  If you’re unsure about how to tag your topic to make sure your voice is heard, check out the most recent archive of posts and the tagging guidelines here.

What’s behind the effort to combine the lists?

“For one, the above list behavior change to address DMARC/DKIM issues is a good reason to want a new list; making those changes to any of the existing lists is already likely to be disruptive anyway as subscribers may be relying on the subject mangling for purposes of filtering list traffic,” writes Jeremy Stanley, infrastructure engineer for the OSF.  “We have many suspected defunct subscribers who are not bouncing…so this is a good opportunity to clean up the subscriber list and reduce the overall amount of email unnecessarily sent by the server.”

There’s another reason behind the mailing lists coming together, writes Chris Dent. “The hope is to break down some of the artificial and arbitrary boundaries between developers, users, operators, deployers and other ‘stakeholders’ in the community. We need and want to blur the boundaries. Everyone should be using, everyone can be developing.”

Dent, a member of the Technical Committee, longtime contributor and prolific chronicler of the OpenStack community, offers up a few helpful reminders about how to make mailing lists work better.

“You’re trying to make the archive readable for the people who come later. It’s the same as code: you’re not trying to make it maintainable by you. It’s not about you. It’s about other people. Who aren’t there right now.”

The post One to rule them all: The OpenStack Foundation unifies mailing lists appeared first on Superuser.

by Superuser at December 10, 2018 02:04 PM

December 07, 2018

OpenStack Superuser

What’s happening now with edge and OpenStack

At the recent Berlin Summit, there was a dedicated track for edge computing with numerous presentations and panel discussions at the conference that were recorded. If you’d like to catch up or see some sessions again, check out visit the OpenStack website for videos.

In parallel to the conference,  the Forum took place with 40-minute-long working sessions for developers, operators and users to meet and discuss new requirements, challenges and pain points to address.

Let’s start with a recap of  the OSF Edge Computing Group and Edge Working Group sessions. (If you’re new to the activities of this group you may want to read my notes on the Denver PTG to catch up on the community’s and the group’s work on defining reference architectures for edge-use cases.)

During the Forum, we continued to discuss the minimum viable product (MVP) architecture that we started at the last PTG. Due to limited time available, we concluded on some basics and agreed on action items to follow up on. Session attendees agreed that the MVP architecture is an important first step and we will keep its scope limited to the current OpenStack services listed on the Wiki capturing the details. Although there’s interest in adding further services such as Ironic or Qinling, they were tabled for later.

The Edge WG is actively working on capturing edge computing use cases in order to understand  the requirements better and to work together with OpenStack and StarlingX projects on design and implementation work based the input the groups has been collecting. We had a session about use cases to identify which are the ones the group should focus on, the most interest was expressed for immediate action with vRAN and edge cloud, uCPE and industrial control.

The group is actively working on the map the MVP architecture options to the use cases identified by the group and to get more details on the ones we identified during the Forum session. If you are interested in participating in these activities please see the details of the group’s weekly meetings.

While the MVP architecture work is focusing on a minimalistic view to provide a reference architecture with the covered services prepared for edge use cases, there’s work ongoing in parallel in several OpenStack projects. (You can find notes on the Forum Etherpads on the progress of projects such as Cinder, Ironic, Kolla-Ansible and TripleO. The general consensus of the project discussions was that the services are in a good shape for edge requirements and there is a clear path forward, for example improving availability zone functionality or remote management of bare metal nodes.

With all the work ongoing in the projects as well as in the Edge WG, the expectation is that we’ll be able to easily move to the next phases with MVP architectures when the working group is ready. Both the group and the projects are looking for contributors for identifying further requirements, use cases or implementation and testing work.

Testing will be a crucial area for edge and we’re looking into both cross-project and cross-community collaborations for with projects such as OPNFV and Akraino.

While we didn’t have a Keystone-specific Forum session for edge this time, a small group came together to discuss next steps with federation. We’re converging towards some generic feature additions to Keystone based on the Athenz plugin from Oath.  (Here’s a Keystone summary from Lance Bragsad that includes plans related to edge.)

We had a couple of sessions at the Summit about StarlingX at the conference part and at the Forum. You can check out videos such as the project update and other relevant sessions of the Summit videos. Because the StarlingX community is working closely with the Edge WG as well as the relevant OpenStack project teams at the Forum, we organized sessions focusing on some specific items for planning future work and increasing understanding requirements for the project.

The team had a session on IoT to talk about the list of devices to consider and the requirements systems need to address in this space. The session also identified a collaboration option between StarlingX, IoTronic and Ironic when it comes to realizing and testing use cases.

With putting more emphasis on containers at the edge the team also had a session on containerized application requirements with a focus on Kubernetes clusters. During the session we talked about areas like container networking, multi-tenancy, persistent storage to see what options we have for them and what’s missing today cover that area. The StarlingX community will be focusing more on containerization for the upcoming releases, so feedback and ideas are important.

One more session to mention is the “Ask me anything about StarlingX” held at the Forum where experts from the community offered help to people who are new and or have questions about the project. The session was well attended and questions focused more on the practical angles like footprint or memory consumption.

Get involved

If you’d like to participate in these activities, you can dial-in to the Edge WG weekly calls or weekly Use cases calls or check the StarlingX sub-project team calls and find further material on the website about how to contribute or jump on IRC for OpenStack project team meetings in the area of your interest.

 

 

Cover photo // CC BY NC

The post What’s happening now with edge and OpenStack appeared first on Superuser.

by Ildiko Vancsa at December 07, 2018 02:02 PM

Chris Dent

Placement Update 18-49

This will be the last placement update of the year. I'll be travelling next Friday and after that we'll be deep in the December lull. I'll catch us up next on January 4th.

Most Important

As last week, progress continues on the work in ansible, puppet/tripleo, kolla, loci to package placement up and establish upgrade processes. All of these things need review (see below). Work on GPU reshaping in virt drivers is getting close.

What's Changed

  • The perfload jobs which used to run in the nova-next job now has its own job, running on each change. This may be of general interest because it runs placement "live" but without devstack. Results in job runs that are less than 4 minutes.

  • We've decided to go ahead with the simple os-resource-classes idea, so a repo is being created.

Slow Reviews

(Reviews which need additional attention because they have unresolved questions.)

  • https://review.openstack.org/#/c/619126/ Set root_provider_id in the database

    This has some indecision because it does a data migration within schema migrations. For this particular case this is safe and quick, but there's concern that it softens a potentially useful boundary between schema and data migrations.

Bugs

Interesting Bugs

(Bugs that are sneaky and interesting and need someone to pick them up.)

  • https://bugs.launchpad.net/nova/+bug/1805858 placement/objects/resource_provider.py missing test coverage for several methods

    This is likely the result of the extraction. Tests in nova's test_servers and friends probably covered some of this stuff, but now we need placement-specific tests.

  • https://bugs.launchpad.net/nova/+bug/1804453 maximum recursion possible while setting aggregates in placement

    This can only happen under very heavy load with a very low number of placement processes, but the code that fails should probably change anyway: it's a potentially infinite loop with no safety breakout.

Specs

Spec freeze is milestone 2, the week of January 7th. There was going to be a spec review sprint next week but it was agreed that people are already sufficiently busy. This will certainly mean that some of these specs do not get accepted for this cycle.

None of the specs listed last week have merged.

Main Themes

Making Nested Useful

Progress continues on gpu-reshaping for libvirt and xen:

Also making use of nested is bandwidth-resource-provider:

Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Extraction

The extraction etherpad is starting to contain more strikethrough text than not. Progress is being made. The main tasks are the reshaper work mentioned above and the work to get deployment tools operating with an extracted placement:

Documentation tuneups:

The functional tests in nova that use extracted placement are working but not yet merged. A child of that patch removes the placement code. Further work will be required to tune up the various pieces of documentation in nova that reference placement.

Other

There are currently only 8 open changes in placement itself. Most of the time critical work is happening elsewhere (notably the deployment tool changes listed above).

Of those placement changes the database-related ones from Tetsuro are the most important.

Outside of placement:

End

In case it hasn't been clear: things being listed here is an explicit invitation (even plea) for you to help out by reviewing or fixing. Thank you.

by Chris Dent at December 07, 2018 12:56 PM

CERN Tech Blog

OpenStack Placement requests - The Uphill and Downhill

Overview In this blogpost I will describe the Placement scalability issue that we identified when OpenStack Nova was upgraded to Rocky and all the work done to mitigate this problem. OpenStack Placement Placement is a service that tracks the inventory and allocations of resource providers (usually compute nodes). The resource tracker runs in each compute node and it sends the inventory and allocations information to Placement through a periodic task.

by CERN (techblog-contact@cern.ch) at December 07, 2018 10:28 AM

Aptira

Software is Eating the Network. Part 2 – Open Source Software

Aprira: Software is Eating the Network. Open Source Software

If we accept that Software is Eating the Network, then we accept that understanding and managing software within an Open Networking solution is critical to successful implementation. We now turn to Open Source Software, a component of Open Networking that is a critical multiplier of the benefits that are derived from the components we already covered, and a critical driver of the success and capabilities of Open Networking. 

Open Source Software has literally exploded into the software marketplace and transformed the approach to Open Networking solutions development. For example, take Github.com, the development platform that hosts a range of software projects including open-source and was recently bought by Microsoft. Github.com was founded in 2008. 

The growth in projects on Github.com is shown in the chart below:  

Aptira: Open Source Software. Github Growth Graph

And the trends in that chart have continued: as at end 2016, Github has 25 million repos, and 67 million at end 2017.  Not all of these are Open Source by any means: in 2016, Github hosted 2.8 million open-source projects that explicitly listed Open source license agreements in the project. 

As a capstone to this massive growth in Open Source, on 4th June, 2018, Microsoft announced that it was buying Github for $US7.5 billion. The CEO of Microsoft, Steve Balmer had once called Linux (and Open Source generally) a “cancer”, in 2001.  Microsoft, a famously proprietary vendor, now finds most of its growth is selling Cloud services to developers, and Open Source is clearly now strategic. 

What produced this massive growth in Open Source? As with networking products, frustration with vendor driven proprietary software. 

Software as commercial product paralleled the development of hardware. Although free sharing of software code was common in the early days of software, commercial pressures sidelined this in the 1970’s and 1980’s. All the commercial software was proprietary: it was owned by the vendor who went to great lengths (often extreme) to prevent that intellectual property being used without compensation. 

Just recently, whilst working with a well-known tertiary institution, I was reminded of how extreme software licence management was in some sectors. This institution was still dealing with an ancient licencing model based on the hardware licence “dongle” that was provided by a software vendor and had to be plugged into the computer on which the software was to be installed & run.  These devices often used up a port (usually the parallel printer port) and were often buggy.  But just the sheer unmanageability of these devices made them a sore point with software users and administrators. 

Vendors argued that their software reduced cost by spreading development investment over many customers, and that a software product based on broad requirements inputs from many customers resulted in a more functional and valuable outcome.   

But it was still only developed by one vendor, and the rate of innovation was directly related to the productivity of that one firm and a function of how much of their revenue the vendor wanted to invest in development and support. Licence costs for software became very expensive, and some software houses very profitable. It was all too easy for the vendor to drift into a “rent seeking” approach of protecting ongoing revenue without necessarily funding ongoing development, which undermined the whole economics of the business model. 

Some thought that it was possible to do better:  

In 1997, Eric Raymond published ‘The Cathedral and the Bazaar’, a reflective analysis of the hacker community and free software principles – (Wikipedia).  

Raymond’s view was that different software development situations demanded different business / development models. In Raymond’s view, some software requirements needed the structure and formality of the Cathedral, but most could be satisfied in a bazaar-like market where products were traded easily amongst a variety of stakeholders. 

The Open Source Initiative was founded in February 1998 to encourage use of the new term and evangelize open-source principles, and in 1998, Netscape released the source code to its Navigator browser as “Free Software”. 

The principles of Open Source software resonate with the philosophies of the Agile movement, which was emerging in parallel: 

  • Users should be treated as co-developers 
  • Early releases 
  • Frequent integration 
  • Several versions 
  • High modularization 
  • Dynamic decision-making structure

Open source was more than just a software licencing mechanism.  Open source established new business models for commercial software enterprises and drove a materially different model for the software development process itself. Numerous companies around the globe have successfully grown business that leveraged open source projects with services, “hardened” versions and support fees. 

What it meant was that a code base could be developed collaboratively by many parties who could all share in the benefits, without a single vendor controlling the pace or direction of the product evolution. 

Since 2011, major companies have got onboard the Open Source bandwagon, developing operationally critical software as open source or releasing in-house developments to open source communities. 

With the brand recognition and financial strength of companies like Google, Facebook and AT&T endorsing the open-source approach, and operationalising business critical functionality using open source software, the Open Source approach became more mainstream. 

This enables many benefits but brings with it many risks and challenges. 

In the next 2 posts we will look at two Open Networking software families that were enabled by the Open Source approach and that in themselves will highlight the many benefits of this approach. 

Stay tuned. 

Let us make your job easier.
Find out how Aptira's managed services can work for you.

Find Out Here

The post Software is Eating the Network. Part 2 – Open Source Software appeared first on Aptira.

by Adam Russell at December 07, 2018 04:40 AM

December 06, 2018

OpenStack Superuser

Contain your excitement: Kata turns one!

A year ago, the Kata Containers project was officially announced at KubeCon + CloudNativeCon in Austin, Texas. On the first anniversary of the Kata project, here’s a highlight of the activities and progress made in the global community.

Community activities

In 2018, the Kata community presented technical updates and hosted gatherings at several global events including KubeCon + CloudNativeCon Austin, KubeCon + CloudNativeCon Copenhagen, OpenStack Summit Vancouver, DockerCon San Francisco, LC3 China, Open Source Summit Vancouver, Container Camp UK, DevSecCon Boston, Open Source Summit & KVM Forum Scotland, OpenStack Summit Berlin and DevOpsCon Munich. Kata has also been featured at several OpenInfra Days, OpenStack Days and other container-focused meetups around the world.

The Kata team at launch.

Watch some of the many Kata presentations:

The Kata Containers project joined the Open Container Initiative (OCI) in March as part of the larger announcement that the OpenStack Foundation became a member of OCI. OCI is an open-source initiative that aims to create industry standards around container formats and runtimes. The OCI specs guarantee that container technology can be open and vendor neutral and become a cornerstone of future computing infrastructure. The Kata Containers community continues to work closely with the OCI and Kubernetes communities to ensure compatibility and regularly tests Kata Containers across AWS, Azure, GCP and OpenStack public cloud environments, as well as across all major Linux distributions.

In July, Clear Linux OS announced support for Kata Containers, allowing users to leverage the benefits of Kata Containers in the Clear Linux OS by easily adding a bundle and optionally setting Kata Containers as their default containers option. Through collaboration with Canonical, we also posted a Kata Containers snap image in the Snap store for use with Ubuntu as well as many other Linux distributions. In Q1 2019, we expect to see Kata in the official openSUSE/SLE repositories.

The first Architecture Committee elections were held in September and the community welcomed Eric Ernst (Intel) and Jon Olson (Google) to join existing members Samuel Ortiz (Intel), Xu Wang (Hyper), and Wei Zhang (Huawei).

In October, the local China community hosted a Kata meetup in Beijing designed for large cloud providers including Alibaba, Baidu, Tencent and more to share adoption plans and feedback for the Kata Containers roadmap. The event, organized by Intel, Hyper and Huawei, shared some early prototypes of Kata Containers on NFV, edge computing, new VMM NEMU and Cloud Containers Instance products. “Kata Containers solves the problem of container isolation by integrating VM and container technologies,” said Bai Yu, senior R&D engineer at Baidu Security. “We apply Kata Containers to DuEdge edge network computing products, which greatly simplifies our resource isolation and container security design in multi-tenant scenarios.”

Development

In May, during the Vancouver OpenStack Summit, Kata landed its 1.0 release. This first release completed the merger of Intel’s Clear Containers and Hyper’s runV technologies and delivered an OCI compatible runtime with seamless integration for container ecosystem technologies like Docker and Kubernetes.

Since September, the project delivered several releases which introduced features to help cloud service providers (CSPs) and others who plan to deploy Kata into production. The 1.3.0 and 1.2.2 stable releases featured Network and Memory hotplug in order to better support CSP customers’ running production environments. The community also continued its pursuit of cross-architectural design by adding more support for ARM64 as well as Intel(R) Graphics Virtualization.

Most recently the project has made some rapid advancements with the 1.4.0 release which offers better logging, ipvlan/macvlan support through TC mirroring, and NEMU hypervisor support. View the full list of 1.4 features in this blog post.

The 1.5 release is currently planned for mid January 2019, and will offer support for containerd v2 shim among other features.

Since its launch, Kata Containers has scaled to include support for major architectures, include AMD64, ARM and IBM p-series. The Kata community is fortunate to have input from some of the world’s largest cloud service providers, operating system vendors and telecom equipment makers such that the project can better serve their internal infrastructure needs as well as enterprise customers.

Up next

The Kata Containers community will again be at KubeCon + CloudNativeCon next week to talk about the latest v1.4 release and use cases for operating secure containers. Kata will have a booth presence to provide more visibility for the project and a central location for outreach, education and collaborative discussions. If you’re attending KubeCon, come join us at booth S17 in the expo. Xu Wang of Hyper.sh, Eric Ernst of Intel, and Jon Olson of Google—members of the Architecture Committee—will join other community leaders at the event.

Kata Containers will be featured in nearly a dozen sessions, notably:

Looking ahead to next year, the Kata community will hold two open meetings on December 17, 2018 to discuss community building, advocacy, events and marketing plans for 2019. Anyone is welcome to participate. Meeting details can be found here.
Kata Containers is a fully open-source project––check out Kata Containers on GitHub and join the channels below to find out how you can contribute.

As we celebrate one year, Kata community leaders and contributors reflect on the project’s growth and impact.

“It’s been great working on Kata over the past year. I love learning, and with this project, we touch so many technical domains while living in a very dynamic ecosystem.  Even an older, mature technology like virtualization is super exciting. For me, it’s the input and contributions from the community which make the project great. Looking forward to scaling further in 2019!”

— Eric Ernst (@egernst), Member of Kata Containers Architecture Committee

 

“My Kata journey began with a marathon-like series meetings, in which we decided to merge runV and Clear Containers, in September 2017 in San Francisco, Denver, and Portland. Then we had impressive preparing meetings in Austin/Oct, Sydney/Nov, and again Austin/Dec, and announced Kata Containers together with Intel and OpenStack Foundation. And this year, in Copenhagen,  Vancouver, Beijing, Taipei, Berlin, and soon Seattle, it is Kata containers that changed our company’s and my track. Thanks for the tremendous efforts from foundation and Intel, Huawei, Google, ARM, IBM, ZTE, etc., the community has made the project become much stabler and faster, and I believe the project will become even better in the next year.”

— Xu Wang (@gnawux), Member of Kata Containers Architecture Committee

“Long long time ago, about 2 years before Kata was born, I noticed the announcement of Intel Clear Containers(CC) and Hyper runV projects, and had been attracted by the idea at once. It solves some real problems for containers, in a smart and practical way. Since then,  I joined the community and worked a lot together with Hyper and Intel. The merge of CC and runV was another exciting news as it consolidated efforts from two communities. Kata containers keeps evolving quickly over the past year, lots of new features were added, code quality and stability are better and better, more people from more companies are joining. This is the charm of open source, this is the power of open source, and this is the most interesting and coolest thing I have done in my past years! Happy birthday, Kata!”

— Wei Zhang (@WeiZhang555), Member of Kata Containers Architecture Committee

 

“It has been exciting as a member of the Kata Community and observing all the exciting changes over the last year. After experiencing the significant adoption of containers in the industry over the last few years, I was instantly captivated by Kata Containers and the idea of running containers in a VM to make it easier to sandbox container workloads. Then, the recent announcement from Amazon to open source their Firecracker micro VM project continues to ignite the excitement in the community.  I am looking forward to noticing more Kata Containers running in more production environments.”

— Rico Aravena (@raravena80), Kata Containers Contributor

 

 

 

 

 

 

 

Cover image: Via Lego.com

The post Contain your excitement: Kata turns one! appeared first on Superuser.

by Claire Massey at December 06, 2018 02:02 PM

December 05, 2018

OpenStack Superuser

Inside open infrastructure: News Bulletin from the OpenStack Foundation

Welcome to the very first edition of the OpenStack Foundation Open Infrastructure Newsletter, a digest of the latest developments and activities across open infrastructure projects, events and users. Sign up to receive the newsletter and email community@openstack.org to contribute.

The Berlin Summit is a wrap!
The open infrastructure community gathered in Berlin recently to share their stories and collaborate. Superuser collected over 50 things you need to know to get caught up from the Summit!

  • Over 300 session videos are now available here.
  • We kicked off a community effort to document the Four Opens. Learn about the philosophy that drives our OpenStack Foundation community and contribute your knowledge and experience!
  • Save the date and register for the Open Infrastructure Summit + Project Teams Gathering in Denver, Colorado, the week of April 29, 2019.

OpenStack project news
OpenStack is an open-source integration engine that provides APIs to orchestrate bare metal, virtual machines and container resources on a single network.

  • Hear a 5-minute technical update on OpenStack from Berlin, plus case studies from Oerlikon ManMade Fibers, Volkswagen, OVH and more.
  • Read the Vision for OpenStack clouds, a living document created by the OpenStack Technical Committee to capture the OpenStack community’s vision for the output of the OpenStack project as a whole as it evolves.
  • Subscribe to the new openstack-discuss mailing list, created to simplify discussions around the OpenStack project and unify the legacy collection of lists devoted developers, operators and special interest groups (SIGs).
  • Rico Lin has proposed the creation of a new Autoscaling SIG. This group will coordinate activities and docs across a number of OpenStack components involved in autoscaling, like Heat, Senlin and Monasca.

Over the last year, the OpenStack Foundation has expanded the markets for open infrastructure (hybrid cloud, container infrastructure, edge computing, CI/CD and artificial intelligence & machine learning) and introduced four new pilot projects. Updates on each of these projects and how you can get involved are provided below.

Airship
Airship is a collection of loosely coupled but interoperable open source tools that declaratively automate cloud provisioning, embracing containers as the unit of infrastructure delivery at scale.

Kata Containers
Kata Containers are extremely lightweight virtual machines that feel and perform like containers, but provide the workload isolation and security advantages of VMs.

StarlingX
StarlingX is a fully featured cloud for the distributed edge, building on existing services such as Ceph, OpenStack and Kubernetes and complementing them with new services like configuration and fault management with focus on high availability (HA), quality of service (QoS), performance and low latency.

  • Watch the 5-minute keynote overview and read the project overview.
  • The team is actively working on containerization and a strategy to move to the master branch of OpenStack for the related services and work with the community on enhancements to further support edge and IoT use cases.
  • The community is planning a contributor meetup for January 15-16, 2019.
  • Learn more about the software and get involved at starlingx.io.

Zuul
Zuul is a Git-driven CI system born out of the OpenStack community that integrates code reviews and automated testing. It’s built for a world where development, testing, and deployment of applications and their dependencies are one continuous process.

  • Watch the five-minute keynote overview, then check out the BMW keynote to see how they use Zuul for their development workflow.
  • The latest release of Zuul now has support for Kubernetes-based nodepool resources.
  • If you recently updated to Zuul 3.3.0, be sure to upgrade to the latest Zuul security release, version 3.3.1 (versions earlier than 3.3.0 are unaffected).
  • Get involved at zuul-ci.org.

Questions / Feedback / Contribute
This newsletter is edited by the OpenStack Foundation staff to highlight open infrastructure communities. We want to hear from you!

If you have any feedback, news or stories that you want to share, reach us through community@openstack.org and if you would like to receive the newsletter, sign up here.

The post Inside open infrastructure: News Bulletin from the OpenStack Foundation appeared first on Superuser.

by OpenStack Foundation at December 05, 2018 08:49 PM

RDO

Community Blog Round Up 05 December 2018

Adam Young discusses OpenStack’s access policy, then deep dives to create a self trust in Keystone while Lars Kellogg-Stedman helps us manage USB gadgets using systemd as well as using ansible to integrate a password management service, then Pablo Iranzo Gómez shows how OpenStack contributions are peer reviewed.

Scoped and Unscoped access policy in OpenStack by Adam Young

Ozz did a fantastic job laying out the rules around policy. This article assumes you’ve read that. I’ll wait. I’d like to dig a little deeper into how policy rules should be laid out, and a bit about the realities of how OpenStack policy has evolved. OpenStack uses the policy mechanisms describe to limit access to various APIs. In order to make sensible decisions, the policy engine needs to know some information about the request, and the user that is making it.

Read more at https://adam.younglogic.com/2018/11/scoped-and-unscoped-access-policy-in-openstack/

Systemd unit for managing USB gadgets by Lars Kellogg-Stedman

The Pi Zero (and Zero W) have support for acting as a USB gadget: that means that they can be configured to act as a USB device — like a serial port, an ethernet interface, a mass storage device, etc. There are two different ways of configuring this support. The first only allows you to configure a single type of gadget at a time, and boils down to: Enable the dwc2 overlay in /boot/config.txt, Reboot, modprobe g_serial.

Read more at https://blog.oddbit.com/2018/10/19/systemd-unit-for-managing-usb-/

Integrating Bitwarden with Ansible by Lars Kellogg-Stedman

Bitwarden is a password management service (like LastPass or 1Password). It’s unique in that it is built entirely on open source software. In addition to the the web UI and mobile apps that you would expect, Bitwarden also provides a command-line tool for interacting with the your password store.

Read more at https://blog.oddbit.com/2018/10/19/integrating-bitwarden-with-ans/

Creating a Self Trust In Keystone by Adam Young

Lets say you are an administrator of an OpenStack cloud. This means you are pretty much all powerful in the deployment. Now, you need to perform some operation, but you don’t want to give it full admin privileges? Why? well, do you work as root on your Linux box? I hope note. Here’s how to set up a self trust for a reduced set of roles on your token.

Read more at https://adam.younglogic.com/2018/10/creating-a-self-trust-in-keystone/

Contributing to OSP upstream a.k.a. Peer Review by Pablo Iranzo Gómez

In the article “Contributing to OpenStack” we did cover on how to prepare accounts and prepare your changes for submission upstream (and even how to find low hanging fruits to start contributing). Here, we’ll cover what happens behind the scene to get change published.

Read more at https://iranzo.github.io/blog/2018/10/16/osp-upstream-contribution/

by Rain Leander at December 05, 2018 04:25 PM

December 04, 2018

Cisco Cloud Blog

Cloud Unfiltered, Episode 60: The OpenStack Edge Computing Group, with Ildiko Vancsa

Is it just me, or have we been talking about edge computing forever? OK, maybe not forever, but certainly on and off for the past ten years. Maybe longer than...

by Ali Amagasu at December 04, 2018 05:42 PM

Chris Dent

Placement Container Playground 8

I've recently made some fairly substantial changes to the container used throughout the Placement Container Playground series, so I figure an update is in order. The previous update was Playground 7.

The main changes involve the management of configuration. Throughout this process I've been trying to align myself with what are considered best practices for containers, using various blog posts, such as 12 Fractured Apps as guidance. The goal has always been a container that is as lightweight as possible, with a single responsibility, while maintaining flexibility and immutability in the same container.

To that end, the container has long had a startup.sh entrypoint that is responsible for creating a local configuration for the service based on external environment variables, and then starting the (non-configurable) uwsgi process. The changes in config mainly adjust database and authentication settings.

That's fine as far as it goes, but having a configuration file at all and having to translate environment variables into it is a bit weak. So now the server process within the container uses environment variables directly.

This required new functionality in oslo.config that I wrote and was released in version 6.7.0. That allows sourcing configuration values from the environment in a predictable way.

Doing this revealed that all the functionality for using the container with keystone was broken. This is now fixed and the README has been updated accordingly.

Also, in the interim, the placement service moved to its own repo, requiring a few other updates in the Dockerfile and the README.

The container continues to be built semi regularly on dockerhub based on when new functionality shows up in placement, and is automatically tested by placecat.

Build Status

by Chris Dent at December 04, 2018 04:50 PM

OpenStack Superuser

How you can contribute to the Four Opens book

OpenStack grew out of the belief that a community of equals, working together in an open collaboration, would produce better software. That software would be more aligned meet to the needs of its users and, in turn, be more largely adopted.

At the most recent Summit in Berlin, a book called the Four Opens was launched.
They are:

  • Open source
  • Open design
  • Open development
  • Open community

After eight years, the Four Opens have proved pretty resilient, consistently managing to capture the OpenStack way of doing upstream open source development. They are instrumental in the success, the quality and the visibility of the OpenStack software.

Earlier this year, the OpenStack Foundation started the Four Opens Book as a way to share these learnings and how the Four Opens were initially intended for upstream as well as how they proved applicable to downstream activities such as user feedback gathering, marketing, or event management.

As the OpenStack Foundation grows to more generally support open infrastructure, the Four Opens will grow alongside it.

The effort was spearheaded by Chris Hoge, strategic program manager at the OSF, who wants to underline that you’re still welcome to contribute and share your open source learnings as this book continues to evolve and reflect community values. Hoge tells Superuser he’s especially interested in contributions outlining how the 4Os are being applied in practice.

The post How you can contribute to the Four Opens book appeared first on Superuser.

by Superuser at December 04, 2018 02:51 PM

December 03, 2018

Ben Nemec

Upstream OpenStack Performance and Release-Shaming

These topics may seem like strange bedfellows, but trust me: there's a method to my madness. Originally this was going to be part of my Berlin summit post, but as I was writing it got rather long and I started to feel it was important enough to deserve a standalone post. Since there are two separate but related topics here, I've split the post. If you're interested in my technical thoughts on upstream performance testing, read on. If you're only interested in the click-baity release-shaming part, feel free to skip to that section. It mostly stands on its own.

Performance

This came out of a session that was essentially about OpenStack performance, specifically how to quantify and test it. Unfortunately, in many cases the answer to the former was "it depends". Obviously it depends on your hardware. Bigger hardware generally means better performance. It also depends on the drivers you are using. Not all virt/network/storage drivers are created equally. Then there's your architecture. How is your network laid out? Are there only fat pipes between some node types or does everything have pretty equal connectivity?

This is of some interest to me because my first job out of college was on a performance team. That also means I have a pretty good understanding of what it takes to extensively performance test a major piece of software. Spoiler alert: It's a lot!

We had a team of 4 or 5 people dedicated primarily to performance testing and improvement. And that was just for my department's specific aspect of the product. There was a whole separate "core" performance team that was responsible for the overall product.

Besides people, there were also racks and racks of hardware needed. Though somewhat less once I got the load testing client software running on Linux instead of Windows, which reduced the hardware requirements to drive the tests by 5 or 10x. Still, we had a sizable chunk of a data center set aside exclusively for our use.

You also need software that can run tests and collect the results in a consumable format. Fortunately this seems to be a solved problem for OpenStack as Browbeat has been around for a few years now and Rally can also drive workloads against an OpenStack cloud.

That's three major pieces - people, hardware, and software - that you need to do performance testing well. OpenStack currently has one of those things that is available upstream. People and hardware? Not so much. I suspect part of the problem is that OpenStack vendors see their performance testing and tuning as a value-add and thus don't tend to publish their results. Certainly Red Hat is doing performance testing downstream, but to my knowledge the results aren't publicly available.

Is this the ideal situation? That depends on how you look at it. From a technical standpoint it would be far better if we had a dedicated upstream team doing regular performance testing against bleeding edge versions of OpenStack. That would allow us to catch performance regressions much faster than we generally do now. Downstream testing is likely happening against the last stable release, and thus is ~6 months behind at all times. On the business side, though, it makes more sense. To a large extent, what Red Hat (for example) is selling is expertise. That includes the expertise of our performance team. If you want their help tuning your cloud, then you pay us for a subscription and you get access to that knowledge. The software is free, but anything beyond that is not. So, ideal? No, but the reality of corporate-sponsored open source probably necessitates it.

The hardware side is also tricky. Upstream OpenStackOpenDev infra is exclusively populated by public cloud resources. These are inherently unsuited to performance testing because they are all on shared hardware whose performance can vary significantly based on the amount of load on the cloud. To do proper performance testing you need dedicated hardware with as few variables from run to run as possible. Even when upstream infra has had bare hardware donated to them, in many cases it didn't last. Apparently it's more common for companies to take back hardware donations than cloud resources. Seems odd, I know, but that was the experience related in this session.

So, what will it take to improve the state of upstream performance testing? Probably someone with moderately deep pockets to pay for the time and hardware needed, and who has a vested interest in improving performance upstream. Not a terribly promising answer, I realize, but that's the nature of the beast. This isn't a problem someone can solve by going heads down on it for a week. It takes an ongoing investment, and unless someone's revenue stream is dependent on it I'm not sure I see how it will happen.

Release-Shaming

However, as you can see from the title there was one other thing in this session that I wanted to touch on. Specifically, a rather extended digression where participants in the discussion browbeat (pun entirely intended) the session leaders for being on an older release. I should note that I came into this session late so it's possible I missed some context for where this came from, but even if that's the case I find it concerning.

Don't get me wrong, to some extent it's a valid point. If you come to upstream and say "we've got a performance problem on Mitaka", upstream's only answer can be "sorry, our Mitaka branches have been gone for a while". But the session wasn't about fixing Mitaka performance, it was (as I mentioned 800 words earlier) about testing and quantifying performance of upstream in general.

Further, this touches on some feedback we've gotten from operators in the past. Specifically that they don't always feel comfortable discussing things with developers if they're not on the latest release. That's why they like the ops meetup - it's a safe space, if you will, for them to discuss their experience with OpenStack and not have to worry about someone jumping on them for not CD'ing master. I exaggerate, but you see my point. This discussion, and apparently previous discussions, with developers was unnecessarily hostile toward the operators. You know, the people we're writing our software for.

The funny thing is that the discussion kind of shined a light on the exact problem the session was trying to solve. At one point there was basically a list of changes that have been made to improve performance since Mitaka. Which is fine, but did that actually improve performance or do you just think it did? In my experience OpenStack performance does not necessarily improve every cycle. Granted, I no longer operate a cloud at any sort of scale, but as a user of OpenStack I actually find that things seem more sluggish on recent releases than they did 3 years ago. As a result I feel the burden of proof is on the developers to show that their changes actually improved performance. How do you prove that? Performance testing! What the session was about in the first place!

In the interest of fairness, I do agree that it would be better on a number of levels if everyone stayed up to date on the most recent OpenStack releases. Heck, OVB on public cloud was blocked for years by the fact that public clouds were running versions of OpenStack that were too old to support some features I needed. I do get that argument. But I also understand that not every company is in a position to jump on the perpetual upgrade treadmill. The fast forward upgrades work that has been going on for a few cycles is a recognition of this, and it's something we need to keep in mind in other areas as well.

Overall I would say that the OpenStack developer community is a very civil, even downright friendly place most of the time. I hope we can extend the same to our colleagues in the operator community.

by bnemec at December 03, 2018 07:01 PM

Berlin Summit Recap

Since there was only one Oslo session and a couple of Designate sessions that I was able to attend, this update is going to be a bit of a grab-bag of topics. Hopefully I have some interesting thoughts on them. :-)

Oslo Project Update

This one should be pretty self-explanatory, but unfortunately the video isn't posted yet. According to the Foundation site they should be available mid-December. I'll update this post when that happens.

In the meantime, the slides are available right now.

[UPDATE] And of course just as I publish this the remainder of the Summit videos go up. :-)

Here's the Oslo project update

Image Encryption Library

There is currently a group working on end-to-end image encryption in OpenStack. This is to say that they want to be able to upload an image that has been encrypted on the client side, store that image in Glance, and then have Nova/Cinder/etc. decrypt it as needed. There are more details on the etherpad for image encryption code.

It turns out that this is a rather tricky thing to do because it crosses so many project boundaries. It requires support in Glance, Nova, Cinder, openstackclient, and openstacksdk. It also means each of those projects needs some amount of image encryption code, and we obviously don't want to duplicate it across all of them. Sounds like a job for Osloman! (coming soon to a theatre near you)

Actually, it's not quite as simple as that. There are libraries that could be used for this, but some of them are not very maintained. Others have a long list of dependencies, which is problematic for things like the sdk and client that ideally should be lightweight. As of this writing there is extensive discussion around how to handle this in the oslo spec. If you have interest or input on this topic please do weigh in there.

Storyboard Migration

We had another good discussion about this in Berlin. Other than the lack of attachment support in Storyboard (which there is a story to address), many of the major blockers seem to be resolved. There was still quite a bit of discussion around usability of things like search and new story creation, but I don't see those as blockers for migrating Oslo.

Attachments may be an issue as I believe there are some existing Launchpad bugs with attachments. For migration purposes we can always refer back to the original Launchpad bug, but at the moment it would mean that users couldn't upload any new attachments. Before we pull the trigger on the migration, the Oslo team will need to have a discussion around how important attachments are.

The main thing that has been holding us up so far is lack of migration of bug triage information. This is important to me, especially since I spent quite a bit of time over the past couple of cycles getting the untriaged bug list in Oslo down to a manageable size. I'd rather not lose that work when we migrate to Storyboard. Fortunately, the Storyboard team was very receptive to adding priority migration support to the existing tool, so hopefully this won't be an issue much longer. I should also note that Doug Hellmann has written a script for updating story tags based on imported priority. If we run that against all of the imported stories it should do what we want, although ideally it would be integrated into the migration tool itself so everyone benefits.

Designate

There was both a design session for shared zones as well as a project feedback session for Designate. Unfortunately I'm a bit light on details because I can't find the etherpads for either. On a very basic level, Graham was willing to help spec out new features, but wasn't going to have time to implement them. Fortunately, there was a verbal commitment in the room to have people assigned to work on some of the feature requests.

If/when I find the etherpads (I'm pretty sure they exist) I'll add more to this section.

Concurrency Limits

This ended up being long enough that I thought it deserved its own post.

Community Goals for Train

Lots of good options, not many that were 100% ready to be a community goal as of this discussion. This is important because we've repeatedly gotten feedback on the goals that it is very unhelpful when something is made a goal, but the pre-work for the goal hasn't been done yet so it becomes a moving target. You can find details on the session etherpad for all of them. Hopefully someone can champion a couple of them so they are ready to go by Train.

I will also take this opportunity to discuss number 12 on the etherpad since I talked about it in the session. It's actually Graham Hayes' idea, but during the goals session he was giving the Designate project update (once again Designate was the last session of the Summit :-). In short, he has proposed improvements to the oslo.middleware healthcheck to make it more useful for containers and root cause analysis. Currently the healthcheck does little more than verify that the REST API of a service is responding and possibly provides some details about the status of the service (which may or may not be good, depending on your security stance. The detail behavior is disabled by default for that reason). Graham's proposal is to extend that so the healthcheck can also verify things like database or messaging access. Service-specific plugins could be provided for services that need to talk to backends so they could verify those connections as well.

The benefit of doing this is that pretty much every modern container ecosystem expects to be able to healthcheck the application running in a container. This allows them to automatically handle service outages to some extent, and right now the healthchecks being used with OpenStack services are fairly ad-hoc and limited. Providing a single method of running healthchecks would be a big improvement. In addition, the self-healing SIG is interested because this healthcheck architecture could be used for root cause analysis. If your healthchecks are reporting on specific aspects of a service's connectivity then it can be helpful in tracing a problem back to its source. As a simple example, if you have an outage and see in your healthcheck logs that 18 services lost their connection to rabbitmq at the same time you know that you should probably start looking at rabbitmq.

Fortunately, the oslo.middleware healthcheck already supports plugins, so Graham has protoyped some changes that would allow the existing middleware to behave as needed. However, he probably won't have time to push this through to completion, so consider this post a bit of a call for help as well. If this sounds like something you'd be interested in, please contact the Oslo team and we can discuss how to move forward.

Given that this is firmly in the "good idea, but not ready" category for Train goals, I currently wouldn't push for it to be selected as a goal. However, I did want to get it some visibility because I think it's something we should do in the near future.

Conclusion

And once again I ended up having way more to say about Summit than I realized when I started writing. :-)

I hope this was interesting and if you have any comments/questions/good knock-knock jokes (only good ones!) don't hesitate to contact me.

by bnemec at December 03, 2018 07:01 PM

OpenStack Superuser

What the OpenStack Upstream Institute is all about

Believe it or not, it’s that time of year again. Now I know because of the abundance of holiday music blaring through store speakers you might think I’m referring Christmas but alas, jolly ole St. Nick is going  to have to wait. I’m talking about the OpenStack Summit, time for contributors and users from around the world to join together, break bread and share ideas.

Before the conference even kicks off, however,  a small group of enthusiasts joined together in what quickly started to feel like the basement of Berlin City Cube. Their goal was to learn how to become active members of the OpenStack community.

OpenStack Upstream Institute was designed by the OpenStack foundation to share knowledge about the different ways to contribute to OpenStack. The program was built with the principle of open collaboration in mind and was designed to teach attendees how to find information, as well as how to navigate the intricacies of the technical tools for each project.

Over the day-and-a-half long course, attendees are given the chance to have hands-on practice in a sandbox environment that helps prepare them to develop, test, prepare and upload new code snippets or documentation for review.

However, their learning is not limited to simply technical training; attendees also learn about the culture of OpenStack and what it means to be a part of the OpenStack Community. For example, in order to develop for OpenStack it is helpful to understand the Four Opens:

1)  OpenStack is open source, being released under the Apache Licence 2.0 and will remain fully open source. This means there will never be  an enterprise edition of OpenStack.

2) Open design means that OpenStack is committed to an open design process; every release cycle holds face to face events where that are open to everyone  so that the OpenStack community can control the design process.  

3) Open development states that OpenStack source code will be maintained in a publically accessible repository during the entire development process.

4) Open community A decision-making policy which assumes general consent if no responses are posted within a defined period.

See the Upcoming Trainings page for details on the Institute – training is occasionally offered outside Summits.

Are you interested in becoming an OpenStack contributor or active member of the OpenStack community but can’t make it to the Upstream Institute training?

Considering the following steps:

  1. Pick an OpenStack project to try.
    Can’t decide? Take a look at the documentation, as every user relies heavily on documentation regardless of the project resources being used.
  2. Sign up for the OpenStack mailing list that falls within your interests
  3. Join OpenStack IRC.
  4. Review the Upstream Institute Training slides.
  5. Check out the the Contributor guide.

About the author

Ell Marquez has been part of the open-source family for a few years now. In this time, she has found the support needed from her mentorship relationships to grow from a Linux Administrator to an OpenStack technical trainer at Rackspace. Recently, she took the leap to join Linux Academy as a technical evangelist.

The post What the OpenStack Upstream Institute is all about appeared first on Superuser.

by Ell Marquez at December 03, 2018 05:22 PM

November 30, 2018

OpenStack Superuser

Detangling Neutron and OVN database consistency

In this post I’ll talk about a problem that affects many (if not all) drivers in OpenStack Neutron and how it was solved for the OVN driver (AKA networking-ovn).

Problem description

In a common Neutron deployment model, multiple neutron-servers will handle the API requests concurrently. Requests that mutate the state of a resource (create, update and delete) will have their changes first committed to the Neutron database and then the loaded SDN driver is invoked to translate that information to its specific data model.
As the following diagram illustrates:

The model above can lead to two situations that can cause inconsistencies between the Neutron and the SDN databases:

Problem 1: Same resource updates race condition

When two or more updates to the same resource are issued at the same time and handled by different neutron-servers, the order in which these updates are written to the Neutron database is correct, however, the methods invoked in the driver to update the SDN database are not guaranteed to follow the same order as the Neutron database commits. That could lead to newer updates being overwritten by old ones on the SDN side resulting in both databases becoming inconsistent with each other.

The pseudo code for these updates looks something like this:

  In Neutron:

    with neutron_db_transaction:
         update_neutron_db()
         driver.update_port_precommit()
    driver.update_port_postcommit()

  In the driver:

    def update_port_postcommit:
        port = neutron_db.get_port()
        update_port_in_southbound_controller(port)

This problem has been reported at bug #1605089.

Problem two: Backend failures

The second situation is when changes are already fully persisted in the Neutron database but an error occurs upon trying to update the SDN database. Usually, what drivers do when that happens is to try to immediately rollback those changes in Neutron and then throw an error but, that rollback operation itself could also fail.

On top of that, rollbacks are not very straight forward when it comes to updates or deletes. For example, in the case where a VM is being teared down, part of the process includes deleting the ports associated with that VM. If the port deletion fails on the SDN side, re-creating that port in Neutron does not fix the problem. The decommission of a VM involves many other things, in fact, by recreating that port we could make things even worse because it will leave some dirty data around.

The networking-ovn solution

The solution used by the networking-ovn driver relies on the Neutron’s revision_number attribute. In short, for each resource in the Neutron database there’s an attribute called revision_number which gets incremented on every update, for example:

$ openstack port create --network nettest porttest
...
| revision_number | 2 |
...

$ openstack port set porttest --mac-address 11:22:33:44:55:66

$ mysql -e "use neutron; select standard_attr_id from ports \
where id=\"91c08021-ded3-4c5a-8d57-5b5c389f8e39\";"
+------------------+
| standard_attr_id |
+------------------+
|             1427 |
+------------------+

$ mysql -e "use neutron; SELECT revision_number FROM \
standardattributes WHERE id=1427;"
+-----------------+
| revision_number |
+-----------------+
|               3 |
+-----------------+

The revision_number attribute is used by networking-ovn to solve the inconsistency problem in four situations:

1. Storing the revision_number in the OVN database

To be able to compare the version of the resource in Neutron against the version in OVN, we first need to know which version the OVN resource is present at.

Fortunately, each table in the OVN Northbound database1 contains a special column called external_ids which external systems (such as Neutron) can use to store information about its own resources that corresponds to the entries in the OVN database.

In our solution, every time a resource is created or updated by networking-ovn, the Neutron revision_number related to that change will be stored in the external_ids column of that resource in the OVN database. That allows networking-ovn to look at both databases and detect whether the version in OVN is up-to-date with Neutron or not.
Here’s how the revision_number is stored:

$ ovn-nbctl list Logical_Switch_Port
...
external_ids        : {"neutron:cidrs"="",
"neutron:device_id"="", "neutron:device_owner"="",
"neutron:network_name"="neutron-139fd18c-cdba-4dfe-8030-2da39c70d238",
"neutron:port_name"=porttest,
"neutron:project_id"="8563f800ffc54189a145033d5402c922",
"neutron:revision_number"="3",
"neutron:security_group_ids"="b7def5c3-8776-4942-97af-2985c4fdccea"}
...

2. Performing a compare-and-swap operation based on the revision_number

To ensure accuracy when updating the OVN database, specifically when multiple updates are racing to change the same resource, we need to prevent older updates from overwriting newer ones.

The solution we found for this problem was to create a special OVSDB command that runs as part of the transaction that is updating the resource in the OVN database and prevents changes with a lower revision_number to be committed.
To achieve this, the OVSDB command does two things:

1 – Add a verify operation to the external_ids column in the OVN database so that if another client modifies that column mid-operation the transaction will be restarted.

A better explanation of what “verify” does is described in the doc string of the Transaction class in the OVS code itself, I quote:

“Because OVSDB handles multiple clients, it can happen that between the time that OVSDB client A reads a column and writes a new value, OVSDB client B has written that column. Client A’s write should not ordinarily overwrite client B’s, especially if the column in question is a “map” column that contains several more or less independent data items. If client A adds a “verify” operation before it writes the column, then the transaction fails in case client B modifies it first. Client A will then see the new value of the column and compose a new transaction based on the new contents written by client B.”

2 – Compare the revision_number from the Neutron update against what is presently stored in the OVN database. If the version in the OVN database is already higher than the version in the update, abort the transaction. Here’s a pseudo scenario where two concurrent updates are committed in the wrong order and how the solution above will deal with the problem:

Neutron worker 1 (NW-1): Updates port A with address X (revision_number: 2)

Neutron worker 2 (NW-2): Updates port A with address Y (revision_number: 3)

TRANSACTION 1: NW-2 transaction is committed first and the OVN resource
now has revision_number 3 in it's external_ids column

TRANSACTION 2: NW-1 transaction detects the change in the external_ids
column and is restarted

TRANSACTION 2: NW-1 the OVSDB command now sees that the OVN resource
is at revision_number 3, which is higher than the update version
(revision_number 2) and aborts the transaction.

3. Detecting inconsistent resources

When things are working as expected the two items above should ensure that the Neutron and OVN databases are in a consistent state but, what happens when things go bad ? For example, if the connectivity with the OVN database is temporarily lost, new updates will start to fail and as stated in problem 2 rolling back those changes in the Neutron database is not always a good idea.

Before this solution we used to maintain a script that would scan both databases and fix all the inconsistencies between them but, depending on the size of the deployment that operation can be very slow. We needed a better way.

The solution for the detection problem was to create an additional table in the Neutron database (called “ovn_revision_numbers”) with a schema that look like this:

Column name Type Description
standard_attr_id Integer Primary key. The reference ID from the standardattributes table in Neutron for that resource. ONDELETE SET NULL.
resource_type String Primary key. The type of the resource (e.g, ports, routers, …)
resource_uuid String The UUID of the resource
revision_number Integer The version of the object present in OVN
acquired_at DateTime The time that the entry was create. For troubleshooting purposes
updated_at DateTime The time that the entry was updated. For troubleshooting purposes

This table would serve as a “cache” for the revision numbers correspondent to the OVN resources in the Neutron database.

For the different operations: Create, update and delete; this table will be used as:

Create operation

Neutron has a concept of “precommit” and “postcommit” hooks for the drivers to implement when dealing with its resources. Basically, the precommit hook is invoked mid-operation when the Neutron’s database transaction hasn’t being committed yet. Also important, the context passed to the precommit hook will contain a session to the active transaction which drivers can use to make other changes to the database that will be part of the same commit. In the postcommit hook, the data is already fully persisted in Neutron’s database and this is where drivers are suppose to translate the changes to the SDN backend.

Now, to detect inconsistencies of a newly created resource, a new entry in the ovn_revision_numbers table will be created using the precommit hook for that resource and the same database transaction. The revision_number column for the new entry will have a placeholder value (we use -1) until the resource is successfully created in the OVN database (in the postcommit hook) and then the revision_number is bumped. If a resource fails to be created in OVN, the revision_number column in the ovn_revision_numbers table for that resource will still be set to -1 (the placeholder value) which is different than it’s correspondent entry in the standardattributes table (which is updated as part of the Neutron’s database transaction).

By looking at the differences in the revision_number’s on both tables is how inconsistencies are detected in the system.

The pseudo code for the create operation looks like this:

def create_port_precommit(context, port):
    create_initial_revision(port['id'], revision_number=-1,
                            session=context.session)

def create_port_postcommit(port):
    create_port_in_ovn(port)
    bump_revision(port['id'], revision_number=port['revision_number'])

Update operation

Every time an update to a resource is successfully committed in the OVN database, the revision_number for that resource in the ovn_revision_numbers table will be bumped to match the revision_number in the standardattributes table. That way if something fails on the OVN side, the revision_number’s from these two tables will be different and inconsistencies will be detect in the same way as they are in the create operation.

The pseudo code for the delete operation looks like this:

def update_port_postcommit(port):
    update_port_in_ovn(port)
    bump_revision(port['id'], revision_number=port['revision_number'])

Delete operation

The standard_attr_id column in the ovn_revision_numbers table is a foreign key constraint with a ONDELETE=SET NULL set. Which means that, upon Neutron deleting a resource the standard_attr_id column in the ovn_revision_numbers table will be set to NULL.

If deleting a resource succeeds in Neutron but fails in OVN, the inconsistency can be detect by looking at all resources where the standard_attr_id column equals to NULL.

When the deletion succeeds in both databases the entry in the ovn_revision_numbers table can then be delete as well.

The pseudo code for the delete operation looks like this:

def delete_port_postcommit(ctx, port):
    delete_port_in_ovn(port)
    delete_revision(port['id'])

4. Fixing inconsistent resources

Now that the algorithm to detected inconsistencies has been optimized we were able to create a maintenance task that runs periodically (every 5 minutes) and is responsible for detecting and fixing any inconsistencies it may find in the system.

It’s important to note that only one instance of this maintenance task will be actively running in the whole cluster at a time, all other instances will be in standby mode (active-standby HA). To achieve that, each maintenance task instance will attempt to acquire an OVSDB named lock and only the instance that currently holds the lock will make active changes to the system.

The maintenance task operation is composed by two steps:

  1. Detect and fix resources that failed to be created or updated
  2. Detect and fix resources that failed to be deleted

Resources that failed to be created or updated will have their standard_attr_id column pointing to the numerical id of its counterpart in the standardattributes table but, their revision_number column will be different.

When inconsistent resources like that are found, the maintenance task will first try to fetch that resource from the OVN database and; if found, the resource is updated to what is latest in the Neutron database and then have their revision_number bumped in the ovn_revision_numbers table. If the resource is not found, it will be first created in OVN and upon success its revision_number will be bumped in the ovn_revision_numbers table.

Resources that failed to be deleted will have their standard_attr_id column set to NULL. To fix this type of inconsistency, the maintenance task will try to fetch the resource from the OVN database and; if found, the resource and its entry in the ovn_revision_numbers are deleted. If not found, only the entry in the ovn_revision_numbers is deleted.

Note that, the order in which the resources are fixed is important. When fixing creations and updates the maintenance task will start fixing the root resources (networks, security groups, etc…) and leave the leaf ones to the end (ports, floating ips, security group rules, etc…). That’s because, say a network and a port (that belongs to that network) failed to be created, if it tries to create the port prior to creating the network that operation will fail again and it will only be fixed on the next iteration of the maintenance task. The order also matters for deletion but in the opposite direction, it’s preferable to delete the leaf resources before the root ones to avoid any conflict.

Conclusion and other notes

This was a long post but I hope it will make it easier for people to understand with a certain level of detail how networking-ovn is dealing with this problem.

Also, in the OpenStack PTG for the Rocky cycle other Neutron driver developers were interested in implementing this approach for their own drivers. It was decided to attempt to make this approach more generic2 and migrate the code to the Neutron’s core repository so drivers can consume it and avoid duplication. The specification proposing this change in Neutron can be found here.

This post first appeared on Lucas Alvares Gomes’s blog. He’s a software engineer at Red Hat currently working on OVN (Open Virtual Network) and related OpenStack projects.

 


  1. Take a look at the OVN architecture for more information about the Northbound and Southbound databases in OVN ↩
  2. For example, by removing the dependency on things like the OVSDB named locks. Please check the proposed specification for Neutron. ↩

The post Detangling Neutron and OVN database consistency appeared first on Superuser.

by Lucas Alvares Gomes at November 30, 2018 03:45 PM

Chris Dent

Placement Update 18-48

Here's a placement update. It's been a rather busy week in placement-land: Gibi and I, with the help of others, have been getting the functional tests in nova working with an external placement. It is working now, which is great, but much work was done, more details within.

Most Important

Progress continues on the work in ansible, puppet/tripleo, kolla, loci to package placement up and establish upgrade processes. All of these things need review (see below). Work on GPU reshaping in virt drivers is getting close.

What's Changed

  • Devstack installs placement from openstack/placement, not nova.
  • Placement now runs tempest and grenade (in py3) in CI. This is great because real integration tests and sad because our once really fast CI runs are gone.
  • The placement service can now start without a conf file. This is basically fixing a bug: we required a single config file in a specific location, rather than the defaults (which includes no file). We require an explicit database connection.

Bugs

Specs

Spec freeze is milestone 2, the week of January 7th. None of the specs listed last week have merged.

Main Themes

Making Nested Useful

Progress is being made on gpu-reshaping for libvirt and xen:

Also making use of nested is bandwidth-resource-provider:

Eric's in the process of doing lots of cleanups to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Extraction

(There's an etherpad which tracks some of the work related to extraction. Please refer to that for additional information.)

Several deployment tools are working on making things work with the extracted placement:

A replacement for placeload performance testing that was in the nova-next job: https://review.openstack.org/#/c/619248/. This might be of interest to people trying to do testing of live services without devstack. It starts with a basic node, turns on mysql, runs placement with uwsgi, and does the placeload testing. Note that this found a pretty strange bug in _ensure_aggregates.

Documentation tuneups:

The functional tests in nova excitement this week has led to some interesting changes. The biggest of these is changing placement to no longer use global config. That patch is an aggregate of the many things done to remove a race condition in the nova functional tests. Trying to use the config of two different projects in the same process space is hard if they are both global. So now placement doesn't. This is a win in many ways. Getting those two patches merged soon is important.

We've been putting off making a decision about os-resource-classes. Anyone have strong opinions?

Other

There are 12 or so open changes in placement itself. Of those these two are the most important:

Outside of placement itself, here are some related changes:

End

That was the November that was.

by Chris Dent at November 30, 2018 01:13 PM

CERN Tech Blog

Container Storage and CephFS (Part 3): Scale testing

Part 1 and Part 2 covered CSI and the CephFS/Manila drivers. In the third part of our recent train wreck to train ride series we summarize the results of a large test against Kubernetes/CSI and CephFS clusters. Spoiler: scaling test with 10'000 concurrent clients Overview Our use cases often include 1000s or 10000s concurrent jobs, all accessing large amounts of data. For this reason any new component being integrated needs to cope with this sort of scale.

by CERN (techblog-contact@cern.ch) at November 30, 2018 07:00 AM

November 29, 2018

Aptira

Software is Eating the Network. Part 1 – A Global Trend

Aptira: Software is Eating the Network - Global Trends

In the second post of this series we divided the broad scope of Open Networking into three convenient “domains”: Infrastructure, Software and Integration.  We have now completed our exploration of the Infrastructure domain and saw the emergence of software in the last couple of posts especially around network devices. 

We transition now to the Software domain. 

Software is Eating the World

Marc Andressen - Wall Street Journal essay in 2011https://www.wsj.com/articles/SB10001424053111903480904576512250915629460

Although Andressen’s quote has become a bit overused, it is probably truer now than in 2011, and is a vital concept to keep top of mind as we look at the impact of software on Open Networking. 

In 2011, Cloud Computing and its enablers were in relatively early days and had not yet become “mainstream”, let alone explosively pervasive. Marc didn’t mention “the Cloud” directly but focused mostly on SaaS companies and software incursions into various industry vertical markets like automotive, oil and gas exploration and financial services. This trend has accelerated since then. 

Take the automotive industry for example.  Although computing technology has been incorporated into car designs for many years, it has been evolving on a unique architecture, designed specifically for a motor vehicle ecosystem.  The main signs of this incursion to the layperson have been the ever-more-complicated diagnostic instruments appearing in maintenance facilities and on the aftermarket firmware “hacks” available to increase engine performance. 

Signs that the pervasiveness of software in car design had reached new heights became obvious recently. In a review of the Tesla Model 3 published on 21st May, 2018, the magazine Consumer Reports withheld its “recommended” status due primarily to issues in braking performance. In response, Tesla was able to diagnose and fix this so quickly that the magazine was able to rerun its tests and publish a modified “recommended” review on 30th May 2018, just 9 days after the initial publication. 

In a “big iron” business like automotive, this is unheard of. 

In Open Networking this trend is more pervasive: software runs through the layers of Open Networking from top to bottom. The emergence of Open Network Hardware and Open Compute devices was predicated on separating the device hardware from their respective operating systems and applications. 

At times the networking equipment and services market has seemed very “big iron”, with expensive and proprietary products with innovation cycles closely controlled by vendors. 

Interestingly, Cisco itself was founded in 1984 with a CPU-based router combined with separate multi-protocol router software that later became Cisco’s IOS.  But the hardware and the software were proprietary to Cisco and the competitive advantage it gave Cisco was largely retained within the company. This aspect of the Cisco product was primarily a competitive advantage for Cisco in its design, manufacture and operation of their products: the benefits of this design devolved primarily to Cisco, not to the customer.  Cisco’s booming profits have demonstrated this mightily. 

Open Networking takes this trend further to Open Hardware devices and more Open Software development models to reduce dependency on vendors and to provide greater opportunity for enterprise users to control their networks in better and cheaper ways. 

Google has been thought-leader in pushing commodity-hardware over proprietary and developing or using software components in preference to dedicated hardware-based components. 

More recently, standards organisations have begun to include this strategy in their specifications. In the Network Function Virtualisation (NFV) world (which we’ll cover in later posts) the TMForum has enshrined this idea in 2 of the 12 principles on NFV use cases: 

9. Hardware Agnostic, separating hardware and software layers, with intelligence held in the software.

10. Resilient to hardware failure and localised load demands through intelligent software

TMForumZoom/NFV User Stories. TR229 Release 14.5.1 February 2015

As Open Networking has evolved, so has the progressive disaggregation of the software stack from the underlying hardware, and the relative balance of solution components has moved more and more heavily towards software. 

As a result, if the “Network is the Computer”, then “Software is Eating the Network”. 

Software enables much faster cycle times for development and enhancements, of course, and provides the ability for rapid diffusion of new capabilities at a global scale. But the competitive advantage of this capability must devolve to the customer, not the network equipment provider or other component vendor. 

The implications of this shift are many, not the least of which is the need to transition people and organisations to new skillsets, for example the much larger role of end-to-end software lifecycle management. 

But the shifting of the balance of capability towards software provided another huge opportunity: global collaboration at scales never seen before. 

How was this opportunity enabled? It’s called Open Source Software. How does it change the dynamics of Open Networking? In every way.  

We’ll explore that more in our next post.  Stay tuned. 

Take control of your Cloud.
Get a customised Cloud strategy today.

Learn More

The post Software is Eating the Network. Part 1 – A Global Trend appeared first on Aptira.

by Adam Russell at November 29, 2018 11:51 PM

OpenStack Superuser

How to use Kiali to monitor micro-services in your Istio service mesh

DevOps evangelist and cloud native app developer Daniel Oh contributed this post.

In cloud-native world, many enterprise developers are considering better way to address non-functional micro-service capabilities such as API, tracing, resiliency, logging, pipeline, elasticity, invocation, authz & authn rather than implementation of business logic through Spring Boot, Vert.x, Node.js, and MicroProfile.

Fortunately, there’s a good solution to deal with this micro-services based service mesh with Istio. However developers still have to handle visual observablity for hundreds of micro-services in a network of services architecture.

A micro-service architecture fundamentally breaks up the monolith into many smaller pieces that fit together. Patterns to secure the communication between services like fault tolerance (via timeout, retry, circuit breaking, etc.) have come up as well as distributed tracing to be able to see where calls are going.

A service mesh can provide these services on a platform level and frees the app developers from those tasks. Routing decisions are done at the mesh level. Kiali works with Istio to visualise the service mesh topology, features like circuit breakers or request rates. Kiali also includes Jaeger Tracing to provide distributed tracing out of the box.

Getting started on OpenShift

Kiali is still in development. Snapshots releases are pushed on Dockerhub from the CI pipeline.

To deploy Kiali to your Istio-enabled OpenShift cluster you can run the following. Kiali currently requires Istio version 0.8.0 (see below if you have not yet installed Istio).

Preparation

First you need to grant the user that is installing Istio and Kiali the cluster-admin role. In the following case this will be the admin user:

oc login -u system:admin
oc adm policy add-cluster-role-to-user cluster-admin -z default adminThen log in as this admin user:
oc login -u admin -p admin

Now you can install Istio if needed. See https://istio.io/docs/setup/ for details.

Install Kiali

Then install Kiali:

curl https://raw.githubusercontent.com/kiali/kiali/master/deploy/openshift/kiali-configmap.yaml | \
   VERSION_LABEL=master envsubst | oc create -n istio-system -f -

curl https://raw.githubusercontent.com/kiali/kiali/master/deploy/openshift/kiali-secrets.yaml | \
   VERSION_LABEL=master envsubst | oc create -n istio-system -f -

curl https://raw.githubusercontent.com/kiali/kiali/master/deploy/openshift/kiali.yaml | \
   IMAGE_NAME=kiali/kiali \
   IMAGE_VERSION=latest \
   NAMESPACE=istio-system \
   VERSION_LABEL=master \
   VERBOSE_MODE=4 envsubst | oc create -n istio-system -f -

If you do not have envsubst installed, you can get it via the Gnu gettext package.

Once the above has completed and the Docker image has been pulled from Dockerhub, go to the OpenShift console, select the istio-system project and determine the base-URL of Kiali

In this case it is http://kiali-istio-system.192.168.64.13.nip.io. In your case this could be a different IP.

You can also use the oc command to determine the base-url:

oc get route -n istio-system -l app=kiali

The Kiali UI

Log in to Kiali-UI as admin/admin.

For best results, you should have an example application like ‘bookinfo’ from the Istio examples deployed.

Detailed view of a single service

Distributed tracing view

If you need more details about kiali project, all resources are here
Please participate and contribute to make it better.

This post first appeared on Linkedin. Superuser is always interested in open infra tutorials, get in touch: editorATopenstack.org

The post How to use Kiali to monitor micro-services in your Istio service mesh appeared first on Superuser.

by Superuser at November 29, 2018 03:17 PM

November 28, 2018

OpenStack Superuser

Zuul case study: Tungsten Fabric

Zuul drives continuous integration, delivery and deployment systems with a focus on project gating and interrelated projects. In a series of interviews, Superuser asks users about why they chose it and how they’re using it.

Here Superuser talks to Jarek Łukow, CI engineer at Codilime, about Tungsten Fabric. Łukow currently operates the OpenStack-backed CI/CB system for Tungsten Fabric (formerly OpenContrail). In one of its primary deployment scenarios, Tungsten Fabric serves as a networking provider for OpenStack. “This means that we face challenges similar to the OpenStack project in our continuous integration system,” Łukow says.  As a result, Tungsten Fabric uses Zuul to gate changes made in over 70 source-code repositories.

The days of annual, monthly or even weekly releases are long gone. How is CI/CD defining new ways to develop and manage software in your work?

One of  Tungsten Fabric’s strengths is a wide variety of integrations and deployment scenarios it provides. Automated testing ensures that new changes and features keep the CI system working across the required range of supported systems, platforms and versions. It also helps us respond faster to changes in software we integrate with. Fortunately, the CI system has been in place from the very beginning and Zuul has always been a large part of its history. At the beginning of this year we migrated from Zuul 2 to 3, making our core continuous integration and delivery pipeline powered entirely by Zuul and its companion services.

What features drew you to Zuul?

I would say that the most important features, in terms of positive impact on our operations, are dynamically spawned disposable slaves and cross-project change dependencies.
The Nodepool service duo (builder and launcher) provides automated image provisioning and CI cluster scalability. This ensures the jobs run in well-defined and reproducible environments, effectively relieving us from the need of maintaining a static pool of shared slaves. The cross-project dependency mechanism is important because our project consists of over 70 repositories and 30 of them are required to build the core of the system. The feature is very handy for the developers and gives them more confidence when testing changes that span multiple repos.

How are you currently using Zuul?

Zuul tests every change going into Tungsten Fabric’s Gerrit review system. It manages the process of compilation, packaging and containerization of our software. The VM-based test environments are well suited for us as some of the tested components need exclusive access to an underlying operating system. We also re-use the build pipelines (or workflows, not to confuse with the Zuul nomenclature) to perform periodic builds and releases.

What other benefits have you seen so far?

The Git-centric Zuul approach of doing things has helped us to get to the “version control everything” model, where all the things, from the CI jobs, through the third-party dependencies to the infrastructure configuration, are version-controlled, reviewed by team members and verified by the CI system.
Also, using Ansible as the main job DSL plays out really well in our organization, as many of people working on our Zuul jobs are already familiar with it and can jump in without the need for learning any new syntax.

What challenges have you overcome?

Understanding trusted and untrusted job contexts is a very important part of learning Zuul and can help you get the most out of it. We started with all of the job configuration in a single config-project and decided to split into a separate untrusted project afterwards. We recommend designing your Zuul jobs so that only the really essential minimal parts are kept in  trusted projects and the rest in untrusted ones, so they can be easily developed and tested. Additionally, as some parts of our infrastructure run inside private network, we had to do some additional plumbing to use a worker cloud from behind a firewall.

What can you tell us about future plans?

The next things on our roadmap are: running nested Hyper-V and ESXi testbeds directly under Nodepool/Zuul v3 and making first trials of the third-party CI approach for performance testing. We expect Zuul to shine in these fairly complex test scenarios.

What are you hoping the Zuul community continues to focus on or delivers?

Zuul has proven to be a reliable and well-designed tool for some of the most complex software projects. Now it may require some more integrations beyond the core ones to become a good fit for a wider group of potential Zuul adopters.
New connection types and Nodepool drivers can be beneficial for current users, like us, to discover new ways of running and optimizing pipelines and will also attract new users that may have specific platform requirements.

Anything else you’d like to add?

I want to acknowledge the quality of the Zuul software suite. We’ve been running its bleeding-edge pre-release version for a long time and did not notice any kinds of instability. The documentation is rock solid, pretty much every feature is tested right away by running in production for one of the largest CI environments – the OpenStack project. Also the community is impressively responsive via IRC and mailing lists.

For more details, check out this session on lessons learned from the Tungsten implementation from the recent Berlin Summit and this blog post over at Codilime.

Superuser is always interested in community content, get in touch at editorATsuperuser.org

 

 

 

 

The post Zuul case study: Tungsten Fabric appeared first on Superuser.

by Nicole Martinelli at November 28, 2018 05:35 PM

CERN Tech Blog

Container Storage and CephFS (Part 2): Auto provisioning Manila shares

Part 1 of this series covered CSI, Kubernetes and the CephFS CSI driver. In this second part we look at the integration with OpenStack Manila and how we auto provision filesystem shares. Check our recent presentation at the OpenStack Summit Berlin - Dynamic Storage Provisioning of Manila/CephFS Shares on Kubernetes, aka “From a train wreck to a train ride”. Why In many cases our users will explicitly create their filesystem shares in advance, and will reuse them over an extended period of time - an example of this would be a filesystem backing a database.

by CERN (techblog-contact@cern.ch) at November 28, 2018 07:00 AM

November 27, 2018

OpenStack Superuser

How to run an open source user group

Running an open source user group can be lots of fun but it sure is lots of work!

Between finding a venue, finding interested speakers (or building interesting content yourself) and promoting the event, you’ll be busy. However, you’ll definitely find it rewarding and it’s a great way to get involved in the community. I’m going to be sharing with you some of my tricks to running a successful user group.

Getting started

  • Presentations are boring but hands-on sessions are a blast. One of my favorite sessions is a basic introduction to cloud computing. When I post the session, make sure to communicate it’s beginner friendly and there’s no previous knowledge required. I’ll even stay away from mentioning any particular technologies so I don’t scare people away. I’ll call the session “Clouds & Coffee” or “Cloud Computing Basics.”
  • Promote your group through an Open Source foundation. Both the CNCF  and the OpenStack Foundation provide sponsorships through meetup.com. This will cover the expenses involved with publishing events on meetup.com and get the word out to potential attendees. I also let local universities know – students love to attend technical meetups.
  • Need a venue? Through Meetup, you can book a space (for free) at WeWork. When you create your first meetup event, you’ll be presented with a link to host the event at WeWork. Simply click through, book a room, and you’re set! Start off small with one of their conference room (six to 10 people).
  • Public OpenStack clouds are your friends. Most of the public OpenStack cloud providers have basic tutorials available online that can form the basis of a user group session. For example, CityCloud has a basic tutorial on spinning up your first virtual machine. There you go, no need to put together any instructions, we’ll just point the attendees to the online post and walk them through it!

Ingredients for a productive meetup

  • Introduce yourself and others! Before you get started, take a moment and let everyone introduce themselves and what brought them to the user group. This is a good way to find future speakers and future venue hosts. Right before the introductions, I like to start with a little video to give people a chance to find a seat and settle things down. “The World Runs OpenStack” is one of my favorite opening videos.
  • Share details on an Etherpad! Before a meetup, I’ll create a bunch of generic cloud user accounts listing them Etherpad (i.e. user01, user02, user03) along with the session instructions. The beauty of Etherpad is anyone can update the file seamlessly. That helps the flow of communication amongst the attendees. You can create your own Etherpad for free at https://etherpad.openstack.org/.
  • “Start here” goes on the whiteboard. As people arrive, I have the starting point of the session listed up on the white board. This includes the link to the session Etherpad, WiFi details, and any passwords required. Each person will claim a cloud account by simply updating the Etherpad with his/her name alongside the cloud account name. Latecomers can simply check the whiteboard and get started!
  • Get things rolling with an ice breaker. These user sessions are supposed to be social! Before diving into the meat of the session, make sure to warm things up. Perhaps have each person explain, in a sentence or two, what brought him/her to the group and learning goals. Or, if you’ve got a prize to give away, a group round of  rock-paper-scissors where the winner of each duel goes on (single elimination) until there’s a group winner. Everyone enjoys getting out of the chair and laughing for a few minutes.
  • Walk around – stay social. While running the session, stay away from the podium and instead walk among attendees helping them through the tutorial. I will often leave my screen (via a projector) on the Etherpad so attendees can look up and see the next steps (without having to tab between windows on their laptops). I’ll check to see if anyone is stuck and general keep a pulse on how things are progressing. If someone gets stuck, I’ll update the Etherpad with clarifications.

Have fun with your new group!

About the author

John Studarus, president of JHL Consulting, provides cloud security product development and cloud security consulting services. Within the open source communities, he volunteers his time managing the community supported Packet CI cloud and running numerous user groups across the U.S. as an OpenStack Ambassador.

 

// CC BY NC

The post How to run an open source user group appeared first on Superuser.

by John Studarus at November 27, 2018 03:09 PM

StackHPC Team Blog

Federation and identity brokering using Keycloak

Federated cloud deployments encompass an ever-evolving set of requirements, particularly within areas of industry, commerce and research supporting high-performance (HPC) and high-throughput (HTC) scientific workloads. It's an area in which OpenStack really shines, through excellent support for federation protocols and its standard API for the manipulation of infrastructure primitives, but at the same time the reality is that no two deployments are entirely alike - and this can cause problems for both users and operators.

If you're an optimistic sort then you could in fact view this as a strength of the platform, as it means that a given installation can be tailored according to the workload. For example, you might have one installation at a particular institution that's designed for provisioning and presenting interfaces to databases, and another which is developed to run a HPC job scheduler such as SLURM. Practically speaking though, the fact that each is installed on completely different selections of hardware, each according to the workload, is of no interest to our users. What they do care about is having access to each in a way that isn't bogged down with too much bureaucracy or burdensome tooling.

To that end, one of the key themes that overarches almost any architectural discussion with regards to federated cloud workloads is that of authentication and authorisation infrastructure (AAI). The need to provide a secure and compliant solution, yet one which is (ostensibly) seamless and as pain-free for users, is certainly a challenge - and getting this right is fundamental to a platform's adoption and its success.

Fortunately there are myriad tools and technologies available to help meet this. Keycloak is one such application, and it's one that we at StackHPC have been experimenting with as a proof-of-concept in order to provide the AAI 'glue' between two cloud deployments that we help to operate.

A few people have asked us to share our experiences and so this blog post is an attempt at summarising those. Also note that this post focuses on browser-driven interactions, as you'll see that a lot of the redirections make use of WebSSO where a web browser is required. A future blog post will delve into interactions using the API via OpenStack and related CLI tools.

Introducing Keycloak

First and foremost, Keycloak is an identity and access management service, capable of brokering authentication on behalf of clients using standard protocols such as OpenID Connect (OIDC) and OASIS SAML. It's the upstream version of RedHat's enterprise Single Sign-On offering and as such is well supported, developed and maintained.

Keystone already has good support for federated authentication in a variety of contexts, and there are existing services such as EGI's Check-in and Jisc's Assent that provide identity brokering using compatible protocols, so why introduce another moving component into the mix?

There are a number of reasons:

  • You might want to be able to associate identity from a number of different sources for a particular user's account;
  • You might want to standardise the authentication protocol for Keystone on OpenID Connect but offer support for integration with identity providers using SAML;
  • A hub-and-spoke architecture (as outlined in the AARC Blueprint) might be preferrable to tightly coupling individual OpenStack clouds with one another in a full-mesh configuration, which is the case when using Keystone-to-Keystone;
  • You might need to be able to federate user authentication via internal sources (such as an Active Directory instance);
  • ... and so on.

For us, it was about being able to add another layer of control and flexibility into the authorisation piece. A proof-of-concept federated OpenStack using two disparate deployments was integrated using the aforementioned EGI Check-in solution, and while this worked well we often found ourselves wanting more control over the attributes, or claims, that are presented and subsequently mapped via Keystone as part of the authorisation stage.

Identity brokering with Keycloak

Authentication

Let's review how Keycloak fits into the equation. A user makes a resource request via their service provider, which in return expects them to be authenticated. When Keystone is configured to use an identity provider (IdP), the user is redirected to the IdP's landing page - which in our case is Keycloak. Here the user is presented with a selection of login choices. Depending on their selection, they're then redirected a second time in order to perform the authentication step, and then Keycloak handles the transparent redirection and security assertion back to Keystone. At this point access is granted, assuming the right mappings are in place to grant membership of the user to a group which has permissions within scope of a project. On paper this sounds somewhat convoluted, but in practice it's reasonably slick and intuitive from a user's point of view:

In my case (in the video above), the Horizon login page redirected me to a Keycloak instance, which presented me with three authentication options. I selected the option which deferred this to EGI's Check-in service, in which I again deferred to Google, and I then used my StackHPC credentials. This provided Check-in with some context, which in turn was passed back to Keycloak, and onwards to Keystone. As part of this final step, Keystone is configured to map a particular OIDC claim containing my company affiliation to a stackhpc group which has the member role assigned in a stackhpc project, thus granting me access to resources on this service provider.

This little demonstration neatly shows one of the immediate benefits of introducing Keycloak into your federation infrastructure - being able to maintain control over a diverse selection of potential authentication sources. At this point, I have an identity within Keycloak that an administrator can associate with other AAI primitives, including multiple IdPs, groups, security policies, and so on.

Integrating OTP

A little side note on what else is possible with Keycloak, a feature that could be of use even if you aren't interested in delegating authentication to another service. Keycloak provides support for One Time Passwords (OTP) - either time-based or counter-based - via FreeOTP or Google Authenticator. Thus, it's possible to federate your users with something such as Active Directory and at the same time add in another layer of security in the form of two-factor authentication.

Once a user has first signed up to Keycloak, either directly (such as via an invitation link), or indirectly by delegation to another configured IdP, the user can login to Keycloak and associate their login with an authenticator. With that in place, they can then access cloud resources on a given service provider using the credentials for their Keycloak account:

And then when prompted, enter the code generated using the Google Authenticator application:

If that's successful then then they're redirected back to Horizon with access to OpenStack resources - secured via two factors used during the authentication process.

Authorisation

As mentioned earlier, one of the problems we were trying to solve was normalising or having control over the identity-related attributes (claims) that are presented to Keystone. It's these claims that give the cloud administrator control over who gets granted access to what, and services such as Check-in make use of proving the entitlement context by hooking into external attribute sources such as COmanage or Perun. However, Keycloak can also assume the role of these components and populate claims based on its knowledge of a particular user. Let's take a quick look into this mapping process and then what the Keycloak configuration looks like in order to influence this process.

Here's a snippet of the JSON mappings file that's configured in Keystone and consulted whenever authentication is triggered via this IdP (Keycloak):

[
  {
    "local": [
      {
        "group": {
          "id": "44a46f4e41504e01ae77008c88dfc2da"
        },
        "user": {
          "name": "{0}"
        }
      }
    ],
    "remote": [
      {
        "type": "OIDC-preferred_username"
      },
      {
        "type": "HTTP_OIDC_ISS",
        "any_one_of": [
          "https://aai-dev.egi.eu/oidc/"
        ],
      },
      {
        "type": "OIDC-edu_person_scoped_affiliations",
        "any_one_of": [
          "^.*@StackHPC$"
        ],
        "regex": true
      }
    ]
  }
]

This basically tells Keystone to create and associate a federated user with the group that has the ID 44a46.. and a username of whatever OIDC-preferred_username contains, as long as these two conditions are met:

  • The HTTP_OIDC_ISS attribute is https://aai-dev.egi.eu/oidc/;
  • The edu_person_scoped_affiliations claim matches the regular expression ^.*@StackHPC$.

These are standard claims returned from EGI Check-in, however what if we wanted to make use of our own arbitrary grouping so that we're not just relying on the above selection in order to associate users with particular group? Keycloak lets you create your own group and user associations, and these are then (by default) in scope for claims presented to Keystone for mapping consideration. So we can expand the above example, and perhaps replace the OIDC-edu_person_scoped_affiliations section with something that makes use of what we get via Keycloak:

{
  "type": "OIDC-group",
  "any_one_of": [
    "StackHPC"
  ],
}

Now, any user in Keycloak associated with the 'StackHPC' group, regardless of their authentication source (so long as it's valid!) will be able to access resources with whatever role is associated with the OpenStack group ID shown in the previous example. Here's what it looks like from Keycloak's point of view, in this group I have three users, each of which has different linked IdP:

If we look at the linked IdP for my personal Google account:

The grouping is an abstraction handled by Keycloak, but it gives us control over which users have access to our OpenStack deployment. This is a simple example and it's possible to do much, much more - including specifying additional attributes to provide further authorisation context, as well as more complicated abstractions such as nested groups - but hopefully this gives you a flavour of what's possible.

Further information

Our proof-of-concept and investigation with federated Keystone would have been immeasurably more difficult if it wasn't for Colleen Murphy's fantastic blog posts, here and here.

It's also worth mentioning that the Keystone team are currently working on developing identity provider proxying functionality, which might make the requirement for Keycloak redundant in a lot of cases. The Etherpad used at the Berlin Summit for the Stein release to gather requirements is here, and there's some more information from the Stein's PTG Etherpad here.

Finally, we'd also like to thank our friends and colleagues at the University of Cambridge for their assistance with the infrastructure resources that made this proof-of-concept possible.

by Nick Jones at November 27, 2018 03:00 PM

Trinh Nguyen

Searchlight weekly report - Stein R-20


Last week, we focused on investigating the use cases and some possibilities of implementing the plugin for Octavia and K8S. There's only one issue with [1] which fails the tox test for just changing the log texts. But, the tests are executed successfully in my local environment. My first guess is that there are some changes happened on the other upstream projects (projects that we have made plugins for). The plan is to search through other projects repositories for any new big changes that may affect Searchlight.

[1] https://review.openstack.org/#/c/619162/

by Trinh Nguyen (noreply@blogger.com) at November 27, 2018 01:03 AM

November 26, 2018

Cisco Cloud Blog

Cloud Unfiltered, Episode 59: Introducing StarlingX, with Glenn Seiler

Do you know what StarlingX is? Yeah we didn’t either. But it was just announced at the OpenStack Summit in Berlin, and there seemed to be quite a bit of...

by Ali Amagasu at November 26, 2018 04:43 PM

OpenStack Superuser

Why Monasca and Kolla are a match made in monitoring heaven

Monasca provides monitoring as-a-service for OpenStack. It’s scalable, fault tolerant, supports multi-tenancy with Keystone integration and you can push metrics into it at any sampling frequency you like. You can bolt it on to your existing OpenStack distribution and it will happily go about collecting logs and metrics, not just for your control plane, but for tenant workloads too.

So how do you get started? Errr… well, one of the drawbacks of Monasca’s micro-service architecture is the complexity of deploying and managing the services within it. Sound familiar? On the other hand this micro-service architecture is one of Monasca’s strengths. The deployment is flexible and you can horizontally scale out components as your ingest rate increases. But how do you do all of this?

At StackHPC our answer is to use the OpenStack Kolla project which already provides some of the supporting infrastructure. To this effect we’ve been working with the Kolla community to add support for deploying Monasca over the past couple of cycles.

Here’s a diagram of what it currently looks like:

The important thing to note:  almost everything in the diagram can be deployed in a highly available configuration. What’s even better is that this is all managed by Kolla-Ansible out-of-the-box. If you want a three node Monasca deployment then simply add your three monitoring nodes to the Kolla-Ansible inventory, and follow the Kolla documentation to deploy them. Of course, it’s worth mentioning that currently this will use the same database and load balancing services as other components in Kolla-Ansible. If this is a concern, or you’re not using Kolla-Ansible to manage your OpenStack deployment, then you can deploy Monasca standalone  and integrate, if you wish, with an external instance of Keystone that needn’t be provided by Kolla.

So how can the community get involved? The first answer is by trying it out! By following the Kolla vagrant documentation in conjunction with the Kolla documentation for Monasca it’s remarkably easy to stand up. Once it’s running you’re likely to start thinking of things to monitor. Examples include anything from RabbitMQ cluster queue length to database writes. Adding support in Kolla-Ansible for gathering these metrics out of the box would be a great thing to contribute.

Secondly, we mentioned earlier that nearly all services can be deployed in a highly available configuration. So which ones can’t? The Monasca fork of Grafana is one of them and InfluxDB, which requires an enterprise license for clustering is the other. Or course, the latter could be solved by buying a license, but Monasca supports Cassandra as a backend and adding support for that to Kolla could make a nice improvement.

Finally, we come to the Monasca Grafana fork which has been a longstanding fly in the ointment for the Monasca project. The fork arose to realize the vision of being able to log into Grafana with your OpenStack credentials and look through a single-pane of-glass at the performance of your OpenStack project. This is a vision that we believe is worthwhile, but unfortunately efforts to merge Keystone integration into Grafana have so far failed and the fork has fallen behind. A renewed effort at integrating Grafana with Monasca is required and if you’d like to get involved with that you’ll be welcomed with open arms.

About the author

Doug Szumski is a cloud software engineer at StackHPC. This is a follow-up to a post he wrote about Monasca coming to Kolla.

The post Why Monasca and Kolla are a match made in monitoring heaven appeared first on Superuser.

by Superuser at November 26, 2018 03:05 PM

Arie Bregman

OpenStack: Testing Upgrades with Tobiko

What is Tobiko?

Tobiko is an OpenStack upgrade testing framework. It aims to provide tooling and methods for easily developing upgrade type tests.

Note: At this moment it also provides you with several built-in networking tests.

If you are familiar with OpenStack you might wonder why the current OpenStack testing framework (Tempest) is not used for that purpose.

First of all, that’s an excellent question. Secondly, imagine the following scenario: you would like to set up several different OpenStack resources (instances, routers, networks, …) before upgrading the cloud, run the upgrade process and once the upgrade process is finished, run the same or similar tests to check if the resources you created are still there and working properly. Another scenario is to test your cloud during the upgrade and analyze what happens to the different resources while upgrading your cloud.

Tempest designed to run, test and cleanup a resource once the test is over. If you would try to run the same test before and after an update it would just create and remove the resources in each phase (pre and post) instead of cleaning them only post-upgrade.

Now, let’s move to the practical part of this post and learn how to use it.

How to install Tobiko?

Run the following code in order to install Tobiko in its own (Python) virtual environment

git clone https://review.openstack.org/openstack/tobiko.git
cd tobiko && pipenv install .

Verify it works by running a simple command like tobiko-list templates. It should list the Heat templates available in Tobiko

[abregman]$ tobiko-list --templates

test_floatingip.yaml
test_mtu.yaml

Running tests

Tobiko at this point is coupled to Tempest. It acts as tempest plugin so you execute it the following way

tempest run --config-file etc/tempest.conf --regex tobiko\.\*test_pre

This will run all the pre-upgrade tests. In order to run the post tests, you simply change the regex

tempest run --config-file etc/tempest.conf --regex tobiko\.\*test_post

To run successfully, it will require you to provide a valid tempest configuration file which provides it with input it uses like authentication method and image to use.

For some commands, Tobiko merely requires the authentication info which can be provided with environment variables and doesn’t require you to pass a whole tempest configuration file.

How it works?

Each test class in Tobiko is associated with a stack/template. When a test runs, it uses the template (which is located with other templates in tobko/tests/scenario/templates)  to create the stack.

Both stack and template are named the same as the test file. This means that if your tests file called “test_floatingip.py” then the stack name is “test_floatingip” and template name is “test_floatingip.yaml”

If the stack already exists, Tobiko will simply skip the stack creation and proceed to run the tests.

Remember, once the test is over it will not remove the resources/stack. There is a util allows you to remove resources created by Tobiko called ‘tobiko-delete. We’ll cover it later on.

Adding a test

First, add a setUp method that will call the base class (that used by each test class) for creating the stack required by the tests. In order to create the stack, the base class needs the file name (of the test class)

def setUp(self):
        super(<YOUR_TEST_CLASS>, self).setUp(__file__)

Next, make sure you have a test for each phase – pre and post. It should be called test_pre_… and test_post_… to match other existing tests included in Tobiko.

Tobiko CLI

Tobiko provides you with tooling in order to ease the process of developing new tests and testing related resources.

tobiko-list

tobiko-list lists the templates defined in Tobiko

tobiko-list --templates

test_floatingip.yaml
test_security_groups.yaml

It can also be used to list the stacks in your OpenStack project but only those that related to Tobiko (either created by it or match a template name defined in the project)

tobiko-list --stacks

test_floatingip

tobiko-create

If you would like only to create the stacks defined in Tobiko and not run the tests, you can use tobiko-create for that purpose.

You can either specify a specific stack the following way

tobiko-create -s test_floatingip

Or create all stacks

tobiko-create --all

tobiko-delete

Once you finished testing/developing, you can use tobiko-delete to clean up resources created during your development/testing. Similarly to tobiko-create you can delete a specific task

tobiko-delete -s test_floatingip

Or all stacks

tobiko-delete --all

Don’t worry about deleting all the stacks in your project. It will only delete stacks created by Tobiko or those that match templates names.

Contributions

Tobiko code review is managed via OpenStack Gerrit system. Follow OpenStack Developer guide to submit patches. The short version is: clone the project, commit your changes and run git-review to submit your patch for a review. The long version is well documented by OpenStack.

by Arie Bregman at November 26, 2018 02:59 PM

The Official Rackspace Blog

Platform Considerations Once You’ve Chosen a Private Cloud

Public clouds are the right choice for many organizations, and across a wide variety of use cases. Certain requirements, however — often some combination of regulation and compliance, security, data proximity, ease of migration, application suitability and cost — can make private cloud the right choice. Once that decision has been made, organizations are wise […]

The post Platform Considerations Once You’ve Chosen a Private Cloud appeared first on The Official Rackspace Blog.

by Steve Garone at November 26, 2018 02:13 PM

SUSE Conversations

SUSE OpenStack Cloud 9 Beta 4 is out!

We are happy to announce the release of SUSE OpenStack Cloud 9 Beta 4 ! Please check out our main SUSE OpenStack Cloud Beta page for more information: https://www.suse.com/betaprogram/cloud-beta/ SUSE OpenStack Cloud 9 focuses on four major topics: OpenStack Rocky Release: new upstream OpenStack content, Update to SUSE Linux Enterprise Server 12 SP4: The latest […]

The post SUSE OpenStack Cloud 9 Beta 4 is out! appeared first on SUSE Communities.

by Vincent Moutoussamy at November 26, 2018 01:59 PM

OpenStack: Digital Transformation for Service Providers

What is digital transformation? Digital transformation is the ‘catch-all’ phrase used to describe the acceleration of IT development to meet the fundamental changes in how businesses operate and deliver value to their customers. While this is different in many industry verticals, companies share some common characteristics on this internal-culture changing journey. Digital transformation means more […]

The post OpenStack: Digital Transformation for Service Providers appeared first on SUSE Communities.

by jvonvoros at November 26, 2018 01:00 AM

November 24, 2018

Chris Dent

Mailing List Review

OpenStack is in the process of merging several of its mailing lists into one, openstack-discuss. The hope is to break down some of the artificial and arbitrary boundaries between developers, users, operators, deployers, and other "stakeholders" in the community. We need and want to blur the boundaries. Everyone should be using, everyone can be developing.

The advent of the new list is perhaps a good time to remind people how to use a collaborative mailing list. There's some old good advice on the OpenStack wiki but it seems if people do read that they certainly don't follow the guidance described there. So, I thought I'd try to recapitulate some of the rules from a different angle.

Your main goal as a member of a mailing list is to keep the archive as useful as possible for other members of the community. Do that, and the dynamic activity on the list itself also manages to be useful.

If you think from that point of view, then the following rules may start to make some sense: You're trying to make the archive readable for the people who come later. It's the same as code: you're not trying to make it maintainable by you. It's not about you. It's about other people. Who aren't there right now.

  • Always respond to the list and only to the list, otherwise threads drift away from the archive, and information is lost.
  • Always use plain text in your email, as that's what the archive is best at displaying in a readable fashion.
  • Always trim your responses to only quote the parts of the message you are responding to. If there is a healthy archive, if the reader needs more context they can get it from the archive.
  • Always, after trimming, respond inline.
  • Always make sure there is visible vertical whitespace between your text and quoted text.
  • Use a mailer that visibly quotes content in responses, using a character (> is the norm), not color or markup.
  • Don't use attachments. If you need to add some non-textual content, put it on the web somewhere and link to it.
  • Encourage awareness of the archive by linking to it often from blog posts, IRC, commit messages, and other mail messages.

Thanks.

by Chris Dent at November 24, 2018 12:15 PM

November 23, 2018

Chris Dent

Placement Update 18-47

It's been a while since the last placement update. Summit happened. Seemed pretty okay, except for the food. People have things they'd like to do with placement.

Most Important

We're starting to approach the point where we're thinking about the possibility of maybe turning off placement-in-nova. We're not there yet, and as is always the case with these kinds of things, it's the details at the end that present the challenges. As such there are a mass of changes spread around nova, placement, devstack, grenade, puppet and openstack-ansible related to making things go. More details on those below, but what we need is the same as ever: reviews. Don't be shy. If you're not a core or not familiar with placement, reviews are still very helpful. A lot of the patches take the form of "this might be the right way to do this".

What's Changed

There is now a placement-manage command which can do database migrations, driven by alembic. This means that the devstack patch which uses the extracted placement can merge soon. Several other testing related (turning on tempest and grenade for placement) changes depend-on that.

Matt did a placement-status command which has a no-op we-are-here upgrade check. We've already met the python3 goals (I think?), so I reckon placement is good to go on community-wide goals. Woot.

The PlacementFixture that placement provides for other projects to do functional tests with it has merged. There's a patch for nova using it.

The spec for counting quota usage in placement has been revived after learning at summit that a proposed workaround that didn't use placement wasn't really all that good for people using cells v2.

Bugs

Specs

Summit and U.S. Thanksgiving has disrupted progress on some of these, but there are still plenty of specs awaiting their future.

Many of these have unaddressed negative review comments.

Main Themes

Making Nested Useful

Progress is being made on gpu-reshaping for libvirt and xen:

Making use of nested is bandwidth-resource-provider:

Somewhat related to nested are a stack of changes to how often the ProviderTree in the resource tracker is checked against placement, and a variety of other "let's make this more right" changes in the same neighborhood:

Extraction

(There's an etherpad which tracks some of the work related to extraction. Please refer to that for additional information.)

TripleO and OpenStack-Ansible are both working on tooling to install and/or upgrade to extracted placement:

libvirt support for GPU reshaping:

Grenade and tempest testing for extracted placement:

A replacement for placeload performance testing that was in the nova-next job: https://review.openstack.org/#/c/619248/. This might be of interest to people trying to do testing of live services without devstack. It starts with a basic node, turns on mysql, runs placement with uwsgi, and does the placeload testing. Note that this found a pretty strange bug in _ensure_aggregates.

Documentation tuneups:

We've been putting off making a decision about os-resource-classes. Anyone have strong opinions?

Other

Besides the 20 or so open changes in placement itself, and those mentioned above, here are some other changes that may be of interest.

End

Lot going on. Thanks to everyone for their contributions.

by Chris Dent at November 23, 2018 03:01 PM

ICCLab

SC2 2018 – The 8th IEEE International Symposium on Cloud and Services Computing

The 8th IEEE International Symposium on Cloud and Services Computing (IEEE SC2) 2018, took place in Paris, France, from the 19th to 22nd of November. The conference was co-located with two more events, namely the 5th International Conference on Internet of Vehicles (IOV) 2018 and the 11th IEEE International Conference on Service-Oriented Computing and Applications (IEEE SOCA) 2018.

We from the ICCLab participated for first three days of the SC2 conference as its focus on Cloud related topics meets our expertise and research interests for research and development activities. The main themes in focus were Cloud Platforms and Services, Networking and Services, and Cloud and SOA Services.

As an important venue for researchers and industry practitioners, SC2 offered the opportunity to exchange information about recent  advancements for IT-driven cloud computing technologies and services. The conference hosted a good number of participants for a familiar context were interacting with peers was easy, not in the last place over the coffee and lunch breaks.  In total 68 oral presentations were planned in 15 sessions. Additionally 10 posters were presented in a poster session and 3 keynotes were organized. The conference was organized over three days, with the first day being dedicated to tutorials and the next days with parallel sessions for each of the co-located conferences.

Besides attending the event itself, the main motivation to visit the SC2 conference was to present our paper entitled “Hera Object Storage: A seamless, Automated Multi-Tiering Solution on Top of Openstack Swift”. In the presented paper we highlighted some of our recent results from research in the field of Cloud storage. In particular, the focus of the contribution is in the fast-growing field  of unstructured data storage in distributed cloud.  We proposed an object storage solution built on top of OpenStack Swift. This solution is able to apply a multi-tiering storage to unstructured data in a seamless and automatic manner. The object storage decisions are taken based on the data temperature, in terms of current access rate.

The first day of the conference, was the day of my arrival. The first tutorial I could attend gave  insights in the NVIDIA company and their activities in the automotive industry. Various interesting results were presented, supported by real-world test videos. We could see how NVIDIA as a market leader supports manufacturers in building self-driving cars. We could appreciate how a full range of real-world conditions influencing the traffic conditions could be handled. The amount of work behind these results was probably not completely clear to many of us, but the needed hardware and software infrastructure was clearly huge!

The second very interesting talk was showing an interesting business model presented by Qarnot computing, France.  The model they presented promoted a solution where computing and heating are delivered from a cloud infrastructure. The solution is based on a geo-distributed cloud platform with server nodes named digital heaters. Each heater embeds processors or GPU cards and is connected to the heat diffusion system. With this solution, homes, offices and other buildings can be heated through the distributed data center which is able to balance the requests in computation and heating.

The last tutorial of the day proposed some basics of machine learning for unsupervised algorithms. A review on the applications and the challenges faced when dealing with data sets was also given.

The second day started with the official opening of the conferences with the presentation of the program. This was followed by a keynote on scheduling methods for elastic services as for a project driven by the AlterWay company in France. The rest of the day we had two conference sessions and a further Keynote speech where Cybersecurity, with its links to geopolitical issues, was in focus.

In the first SC2 session I attended we could follow presentations about the following topics: a comparison between unikernels and containers, user plane management for 5G networks, a cost analysis of virtual machine live migration, and two papers on automated tiered storage solutions, one of which I presented myself. The second session was dedicated to work in progress papers, covering topics like contextual information searching for encrypted data in cloud storage services, smart contracts with on and off-blockchain components, cloud native 5G virtual network functions.

The evening we could enjoy a banquet on the Seine river with all the conference attendees. The cruise on the Seine brought us close to the main sightseeing attractions of beautiful Paris. A fish-based dinner was served completing a perfect environment to exchange experiences with other conference participants.

Day 3 started with an enlightening keynote speech of Prof. Cesare Pautasso from the University of Lugano, Switzerland, which described the recent trend in terms of software development. This is dictated by the current scenario where end-users have multiple devices to access their data and contents and managing their personal information. To best manage such a complex multi-device user environment Liquid software is needed, whereby software can seamlessly flow and adapt to the different devices.

After the last session with some interesting papers presenting among others solutions for multi-objective scheduling in cloud computing, confidentiality and privacy issues in the Cloud, it was time to head back home. Our participation to the SC2 conference was definitely positive and we will surely consider next year’s conference edition as possible venue to share our new research experience.

by milt at November 23, 2018 02:01 PM

November 22, 2018

Fleio Blog

Fleio 1.2: instance backup, IPv6 firewall rules, TLDs custom fields and more

We’ve just released Fleio 1.2. OpenStack instance backup, IPv6 firewall rules, TLDs custom fields and many more. Here are some of the improvements: New OpenStack instance can boot from existing volume OpenStack instances backup feature End-users can upgrade the service and invoice is automatically generated for the difference Improve payment journal display Add IPv6 to […]

by adrian at November 22, 2018 03:56 PM

Adam Young

Scoped and Unscoped access policy in OpenStack

Ozz did a fantastic job laying out the rules around policy. This article assumes you’ve read that. I’ll wait.

I’d like to dig a little deeper into how policy rules should be laid out, and a bit about the realities of how OpenStack policy has evolved.

OpenStack uses the policy mechanisms describe to limit access to various APIs. In order to make sensible decisions, the policy engine needs to know some information about the request, and the user that is making it.

There are two classes of APIs in OpenStack, scoped and unscoped. A Scoped API is one where the resource is assigned to a project, or possibly, a domain. Since Domains are only used for Keystone, we’ll focus on projects for now. The other class of APIs are where the resources are not scoped to a project or domain, but rather belong to the cloud as a whole. A good example is a Nova Hypervisor.

The general approach to accessing scoped resources is to pass two checks. The first check is that the auth-data associated with the token has one of the appropriate roles in it. The second is that the auth-data is scoped to the same project as the resource.

For an example, lets look at the cinder API to for volumes. The API to create a new volume is:

POST /v3/{project_id}/volumes

and the API to then read the volume is

GET /v3/{project_id}/volumes/{volume_id}

The default policy.yaml for these APIs shows as:

# Create volume.
# POST /volumes
#"volume:create": ""

# Show volume.
# GET /volumes/{volume_id}
#"volume:get": "rule:admin_or_owner"

We’ll dig a little deeper into these in a moment.

One thing that distinguishes Cinder from many other APIs is that it
includes the project ID in the URL. This makes it easier to see what
the policy is that we need to enforce. For example, if I have a
Project ID of a226dc9813f745e19ece3d60ac5a351c and I want to create a
volume in it, I call:

POST https://cinderhost/v3/a226dc9813f745e19ece3d60ac5a351c/volumes

With the appropriate payload. Since the volume does not exist yet, we
have enough information to enforce policy right up front. If the
token I present has the following data in it:

{
  "token": {
    "methods": [
       "password"
     ],
     "roles": [
        {
            "id": "f03fda8f8a3249b2a70fb1f176a7b631",
             "name": "Member"
        }
     ],
     "project": {
        "id": "a226dc9813f745e19ece3d60ac5a351c",
        "domain": {
             "id": "default",
             "name": "Default"
        },
        "enabled": true,
        "description": null,
        "name": "tenant_name1"
     },
  }
}

Lets take another look at the policy rule to create a volume:

"volume:create": ""

There are no restrictions placed on this API. Does this mean that
anyone can create a volume? Not quite.

Just because oslo-policy CAN be used to enforce access does not mean
it is the only thing that does so. Since each of the services in
OpenStack have had long lives of their own, we find quirks like this.
In this case, the URL structure that has the project ID in it is
checked against the token externally to the oslo-policy check.

It also means that no role is enforced on create. Any user, with any
role on the project can create a volume.

What about afterwards? The rule on the get command is

"volume:get": "rule:admin_or_owner"

Here’s another gotcha. Each service has its own definition of what is
meant by an owner. You need to look at the service specific definition
of the rule to see what this means.

# Default rule for most non-Admin APIs.
#"admin_or_owner": "is_admin:True or (role:admin and is_admin_project:True) or project_id:%(project_id)s"

If you have understood the earlier two articles, you should be able to
interpret most of this rule. Lets start with the rightmost section:

`or project_id:%(project_id)s"`

The or rule means that, even if everything before this failed, we can
still pass if we pass only the part that follows. In this case, it is
doing the kind of scope check I described above: that the project_id
on from the token’s auth-data matches the project_id on the volume
object. While this is Cinder, and it is still doing the check based
on the URL, it aslo checks based on the resource, in this case the
volume. That means that this chech can’t happen until Cinder fetches
the Volume record from the database. There is no role check on this
API. A user with any role assigned on the project will be able to
execute the API.

What about the earlier parts of the rule? Lets start with the part we
can explain with the knowledge we have so far:

`role:admin`

This is a generic check that the user has the role assigned on the
token. If we were to look at this rule a couple years ago, this would have
been the end of the check. Instead, we see it is coupled with

`and is_admin_project:True`

This is an additional flag on the token’s auth data. It is attempting
to mitigate one of the oldest bugs in the bug tracker.

Bug 968696: “admin”-ness not properly scoped

Another way to describe this bug is to say that most policy rules were
written too permissive. A user that was assigned the `admin` role
anywhere ended up having `admin` permissions everywhere.

This breaks the scoping concept we discussed earlier.

So, what this flag implies is that the project that the user’s token is scoped
to is designated as the `admin` project in Keystone. If this is the
case, the token will have this additional flag set.

Essentially, the `admin` project is a magic project with elevated
privileges.

This provides a way to do cloud-wide administration tasks.

What about that first rule:

`is_admin:True`

This is a value set by the code inside the Cinder service. A similar
pattern exists in most projects in OpenStack. It is a way for cinder
to be able to override the policy check for internal operations. Look
in the code for places that call get_admin_context() such as:

volume_types = db.volume_type_get_all(context.get_admin_context(),
False)

What about those unscope APIs we were looking at earlier? It turns
out, they are mostly implemented with the first half of the cinder
Rule. For example, Update cluster has the policy rule

# PUT /clusters/{cluster_id}
"clusters:update": "rule:admin_api"

which is implemented as

# Default rule for most Admin APIs.
"admin_api": "is_admin:True or (role:admin and is_admin_project:True)"

One requirement that the Operator community had was that they needed to be able to do cloud wide operations, even when the operations should have been scoped to a project. List all VMs, list all users, and other types of operations were allowed to happen with admin-scoped tokens. This really obscured the difference between globlal and project scoped operations.

The is_admin_project hack works, but it is a bit esoteric. One current effort in the Keystone community is to do something a little more readable: actully have peroper scoping for things that are outside of projects. We are calling this Service scoping. Service scoped roles are available in the Rocky release, and can be used much as is_admin_project to mitigate bug 968696.

by Adam Young at November 22, 2018 02:56 PM

November 21, 2018

SUSE Conversations

My Top 5 Takeaways from OpenStack Summit Berlin

Another outstanding OpenStack Summit is done and dusted.  As expected, Berlin was a magnificent venue and it was truly a delight to reconnect with old friends and spend valuable time with colleagues and customers across the community. After such a dynamic event, here are my top 5 takeaways: 1.    OpenStack is a great community doing […]

The post My Top 5 Takeaways from OpenStack Summit Berlin appeared first on SUSE Communities.

by Terri Schlosser at November 21, 2018 12:00 PM

November 19, 2018

OpenStack Superuser

OpenStack Summit Berlin recap: 51 things you need to know

Berlin — A light fall rain couldn’t dampen the enthusiasm of over 2,700 stackers, more than 50 percent of whom came to the German capital from 63 countries.

If you didn’t attend—or if you did and want a replay—Superuser collected the announcements, user stories and Forum discussions you may have missed.

Jump to roadmap & technical decisions
Jump to users in production
Jump to news from the OpenStack ecosystem
Jump to what’s next

Berlin Summit was the place to be

  • Five headline sponsors supported the conference in Berlin, the most ever. Attendance is up from the last two Summits.
  • The edge computing hackathon, hosted by Deutsche Telekom on Saturday and Sunday, sold out. There was a tie for first place: one team built multi-cloud functionality for OpenStack Client, and the second team devised a home-built environmental control and management system that is automated using IoT devices.
  • Ceph held a dedicated day on the Monday before the Summit where the community announced the launch of Ceph Foundation, under the umbrella of the Linux Foundation.
  • The Summit featured an incredible lineup of new OpenStack users speaking, plus lots of sessions about new OpenStack Foundation (OSF) pilot projects and adjacent community collaborations.

Spotlight on new and growing users, strong contributor metrics

  • New OpenStack users speaking at the Summit included Leboncoin, Metronom, Oerlikon Manmade Fibers, SBAB Bank, UK Science & Technology and Volkswagen Financial Services.
  • Growing OpenStack users: Workday talked about scaling from 50,000 to 300,000 cores. BMW discussed their Zuul use case in addition to Openstack, and eBay Classifieds Group shared how they updated their OpenStack cloud for Spectre and Meltdown.
  • OpenStack is one of the top three open-source projects in the world. The other two are Linux and Chromium (the upstream of Chrome browser). There were 70,000 commits to OpenStack in the last year, clocking an average of 182 changes per day in the Rocky release.
  • The OpenStack 2018 User Survey Report went live Tuesday morning at the Summit, highlighting a growth in OpenStack Ironic bare metal clouds fueled by Kubernetes adoption.
  • A recent report from CCW Research and approved by the Chinese Ministry of Industry and Information Technology named the top 20 private cloud providers in China. Four out of the five top providers are OpenStack-based companies and 14 of the top 20 providers are based on OpenStack. More details here.

OpenStack Foundation showcases new features, roadmaps and use cases of open infrastructure pilot projects

  • The Airship team announced delivery of their release candidate on the road to 1.0 early next year. Airship is currently being used in production by AT&T and SK Telecom, and AT&T presented a 5G demo on the keynote stage Wednesday, featuring the lifecycle management technology. 99cloud, Ericsson and several other organizations are also getting involved.

  • The Kata Containers community is working on their 1.4 release, expected to arrive shortly. Recently the community hosted a meetup in China designed for large cloud providers—including Alibaba, Baidu and Tencent—to share their adoption plans and provide feedback on the software roadmap.
  • StarlingX recently celebrated their first release on October 24 with 84 contributors from 99cloud, China UnionPay, Fujitsu, Intel, NEC, SUSE and Wind River, among others. They also recently established their technical steering committee and provided an update during the Joint Leadership meeting on Monday.
  • Zuul users BMW and Leboncoin both presented case studies at the Berlin Summit. Nodepool driver for Kubernetes is in review, and support for Azure, GCP, Gitlab and Pagure are in the works. Read more here.
  • OpenStack Foundation board expands mission to host new open source projects as part of Open Infrastructure transformation. The board of directors of the OpenStack Foundation (OSF) today adopted a resolution advancing a new governance framework supporting the organization’s investment in emerging use cases for OpenStack and open infrastructure. These include continuous integration and continuous delivery (CI/CD), container infrastructure, edge computing, datacenter and, newly added, artificial intelligence/machine learning (AI/ML). The board resolution, approved in a meeting held in Berlin on Monday, authorizes the officers of the OSF to select and incubate Pilot projects. Full press release
  • The Four Opens book launch. The Four Opens (open source, open development, open design and open community) were created in 2010 as founding principles to sustain the creation and growth of the OpenStack project. Now you can read (and help contribute) to them in book form. “We’ve collected these notes and have written some seeds to start this document,” says OSF’s Chris Hoge. “I’ve staged this work into GitHub and have prepared a review to move the work into OpenStack hosting, turning this over to the community to help guide and shape it.”

Here’s an overview of some of key community activities, technical decisions and roadmap discussions:

  • The community voted to name the upcoming OpenStack release Train. The second release for 2019 is estimated arrive in August.
  • Some community goals up for discussion for the T release include dropping Python 2 support; improving cloud reliability by testing, reporting and enforcing self consistency; health checks for clusters in operation; and API consistency, allowing breaking changes. Etherpad discussion here.
  • A cross-technical leadership session (with technical leaders from OpenStack, Kata Containers, StarlingX, Airship, and Zuul) was held to to compare the initial governance structures for the pilot projects and cover pain points. Etherpad discussion here.
  • Nine community members were recognized by the Community Contributor Awards with quirky categories like the Bonsai Caretaker Award.
  • The OpenStack community and Superuser editorial advisors weighed in on the finalists and chose City Network at the OpenStack Berlin Summit Superuser Awards, sponsored by Zenko.

You can find all of the Etherpads from the Forum here.

Now, a word from open infrastructure users in production

  • Adobe Advertising Cloud shared its upgrade journey and described how, by leveraging Canary Releases techniques, the team managed to do complete infrastructure upgrades and migrate workloads between two OpenStack environments with no impact for stakeholders and users.
  • AT&T took to the keynote stage to talk about Fifth Generation Wireless (#5G), the first generation of mobile wireless services developed on, and running in, the cloud. ATT is rolling out its 5G network later this year, which will run on OpenStack and be deployed by the Airship project. Catch the demo
  • BMW’s Tobias Henkel talked about what’s driving the automaker forward as continuous integration becomes key to his industry explains why BMW is using Zuul future automotive projects
  • China UnionPay, shared the results of a joint-collaboration project with China Fudan University that aims to develop a converged virtualized and containerized architecture for financial big data applications.
  • eBay Classifieds Group shared how they handled Spectre and Meltdown. The group has a private cloud distributed in two geographical regions with about 1,000 hypervisors and an 80,000-core capacity. Once the vulnerabilities came to light, they needed to patch hypervisors on four availability zones for each region with the latest kernel, KVM version and BIOS updates.
  • Leboncoin is a French classifieds company. They shared how OpenStack tools—especially CI—have helped them fuel the growth that has made them the fifth most clicked website in the country.
  • Metronom, the IT supplier for Metro AG, a wholesaler operating in 25 countries, related its experience with OpenStack and how open source influences its internal culture.
  • NASA Goddard’s NASA Center for Climate Simulation described the challenges and the innovative solutions devised on the journey to provide an on-premises private cloud including: telemetry/billing, data protection/DR, security, “cloudifying” workloads, containers and guiding HPC users through the paradigm shift to cloud computing.
  • Oath led a hands-on workshop around upgrading an OpenStack environment, sharing their experiences with fast forward database migration across multiple versions and components of OpenStack at massive scale.
  • Oerlikon ManMade Fibers shared how half the yarn production in the world is powered by its OpenStack-based systems. OpenStack’s open infrastructure is the foundation of Oerlikon Group’s solution architecture, allowing Oerlikon to utilize machine learning and AR/VR to realize its vision to create the world’s leading smart factory. More from the keynote.
  • OVH CTO Alain Fiocco described from the keynote stage what makes this large infrastructure provider different, how OpenStack and open infrastructure help OVH innovate, and what the daily life of a large scale operator looks like.
  • The Pawsey Supercomputing Centre supports University and industry researchers. Aside from HPC facilities, they also have a cloud service called Nimbus deployed with Openstack Pike. They shared issues and caveats of their deployment via Puppet and MAAS and compared bare metal performance and show the capabilities of RDMA GPU to GPU on different physical nodes on VMs.
  • SBAB Bank, Sweden’s fifth largest bank, shared how OpenStack enables digital speed and flexibility using City Cloud for Bank & Finance, powered by a managed private cloud.
  • The UK’s Science and Technology Facilities Council, which provides around 4,000 vCPU cores and supports thousands of scientists, related its OpenStack upgrade story about upgrading from Mitaka to Queens.
  • Verizon participated in multiple sessions, sharing valuable edge computing insights from their experience running applications and hardware at the edge.
  • Volkswagen came to the keynote stage to talk about their OpenStack private cloud, “Our users can get what they need from cloud really fast, and we’re proud of that,” says Volkswagen head of server operation Tilman Schulz. VW will use OpenStack to support its new online and mobile services. They have two clouds running with Newton release, one cloud with Mitaka release, managing in-place upgrades, moving from OVS to Juniper Networks Contrail, and using Ceph for storage. Catch his chat with OSF’s Lauren Sell during Tuesday’s keynote here.
  • Volkswagen Financial Services is a wholly owned subsidiary of Volkswagen AG that operates and coordinates the financial services activities of the Volkswagen Group throughout the world. Stefan Kroll, product owner for OpenStack and the lead for the MultiCloud Computing project, participated in an industry panel on data protection that included Trilio Data, Red Hat and CSI Piemonte.
  • Workday’s OpenStack cloud has grown from a 600-server fleet in 2016 to 4,600 servers by the end of 2018. Workday shared operational and scaling challenges of growing from 50,000 to 300,000 and their plans to keep improving operational excellence.

This just in from the OpenStack ecosystem

  • Component Soft to surpass 2,000 Openstack and Kubernetes course participants by the end of 2018. The leading OpenStack Training Partner and Kubernetes Training Partner in Europe reach this milestone in its OpenStack and Kubernetes trainings, two hot complementary technologies in the open cloud marketplace. While roughly two thirds of this number has come from its Openstack courses since 2014, Kubernetes has grown quickly since 2017. Customers are typically telco and IT giants such as Ericsson, Deutsche Telekom Group and BAE Systems as well as local IT firms and start-ups from whole Europe. These courses prepare participants for the relevant COA and CKA exams as well.
  • Deutsche Telekom: More security ex works. With its November release, the Open Telekom Cloud now offers more IT security ex works. Thanks to encryption for the Workspace Service and Mongo DB, companies can now quickly and easily protect their digital workstations as well as the scalable, relational databases in the Open Telekom Cloud. There are also further improvements to Telekom’s public cloud offering based on OpenStack in the areas of services and management. Full press release
  • Objectif Libre released a new automation tool for OpenStack Cloud based on Ansible called Build & Lifecycle Manager. This tool, based on Ansible, aims to simplify the deployment and upgrades of OpenStack clouds – community version. Customizable for each customer environment, the tool is born from the expertise of the company’s consultants and is maintained with OpenStack releases. Hence, customers can benefit from updates and functional upgrades from the company’s OpenStack experts.
  • OVH public cloud arrives in the U.S. The OVHcloud Public Cloud is now available in the company’s East Coast data center with the West Coast data center expected to go live in January 2019. The public cloud offers block, archive, and object storage powered by OpenStack Swift and Ceph. OVH currently runs 28 data centers, 360,00 servers, 1.4 million customers, 260 instances in production, 150 petabytes of storage with Swift.
  • ScaleUp and HKN present de:stack – the IaaS cloud for the German SME Market. Together ScaleUp and HKN have built the new IaaS cloud platform de:stack based on OpenStack. The goal is to offer customers a secure, GDPR-compliant alternative to the cloud offerings of AWS, Google and others. All data center locations of de:stack are located in Germany and ISO-27001 certified. Berlin and Hamburg are already available online with Duesseldorf launching in December 2018. Another de:stack location, Frankfurt, is planned for 2019.
  • SoftIron joins founding board of the Ceph Foundation. Alongside other industry heavyweights such as Red Hat, China Mobile and Canonical, SoftIron will play a pivotal role in the leadership and direction of the future development of Ceph, the leading open-source data storage platform that supports scalable object-, block- and file- level storage.
  • Storage Made Easy brought its accelerated data transfer File Fabric solution at the OpenStack Summit in Berlin. Storage Made Easy™ (SME) announced they are presenting their Enterprise File Fabric solution at the OpenStack Summit in Berlin. As digital transformation petabyte-scale object storage becomes commonplace, enabling performance, reliability and scalability. With increased storage usage, new challenges are faced: Companies are  creating large amounts of new digital data which needs to be moved at high speed between users and the object storage, and between object storage and other storage tiers.
  • SUSE paired with QSC AG to help its colocated customers increase the flexibility and value of their services by enabling easier access to cloud computing resources.QSC shared their journey to build a hybrid cloud based on OpenStack and Ceph. Learn about how they increased flexibility for their customers without risking security
  • VEXXHOST announces enterprise-grade GPU instances to expand high-performance cloud computing offering. Canadian cloud provider VEXXHOST announced the launch of its latest cloud offering: enterprise-grade GPU instances on OpenStack® based public, private and hybrid cloud. The new GPU offering will empower users working with compute-intensive and high-performance computing (HPC) applications—such as AI, machine learning, blockchain and big data—to increase productivity and reduce costs. Press release
  • Wind River and CENGN help open source devs accelerate StarlingX adoption for edge. Wind River & CENGN are working together to to create a public repository to host StarlingX resources as a reference for the open source community. As a source- code only project without any .iso or binary image available, interested developers have been challenged to quickly get on board and meaningfully experiment with StarlingX. With this new public repository, visitors will soon be able to download an ISO image and gain access to the upstream components. Currently, developers can access CENGN’s mirror site for StarlingX, as well as check for updates about the repository, at http://mirror.starlingx.cengn.ca/mirror/.
  • Yahoo! JAPAN/Actapio selects Quobyte as storage foundation for their private cloud infrastructure.  Quobyte® Inc., a leading developer of modern storage system software, announced today that Actapio, Inc., a U.S. subsidiary of Yahoo Japan Corporation, selected Quobyte to provide the storage platform for its data centers. Quobyte’s Data Center File System provides Yahoo! JAPAN/Actapio with a massively scalable and fault-tolerant storage infrastructure that meets the needs of the internet giant as it increases its focus on application development and operation. Yahoo! JAPAN presented a session on their deployment of OpenStack with Quobyte.
  • ZTE: build a 5G-ready cloud infrastructure to accelerate the large-scale launch of ZTE unveiled the 5G-Ready 4MIX Distributed Cloud solution based on a three-layer distributed deployment architecture. It leverages multiple advanced technologies including: high-performance resource pool adopting a variety of hardware acceleration technologies (e.g. FPGA-based SmartNIC) with a unified portal, to flexibly adapt to upper layer applications; OpenStack + Kubernetes dual-core-driven cloud platform, to achieve the unified management and on-demand scheduling of virtual machines (VM) and container resources; combined O&M mode based on remote control, AI and other technologies, achieving unmanned operation at the network edge, and constructing the end-to-end closed-loop automated O&M in the whole network, to maximally release manpower.

What’s next

That’s a strong finish for the Berlin Summit, but we’re already thinking about our next run. We’re taking the Open Infrastructure Summit to Denver!

Videos for Berlin Summit sessions will be uploaded shortly on the OpenStack website.

The post OpenStack Summit Berlin recap: 51 things you need to know appeared first on Superuser.

by Superuser at November 19, 2018 03:39 PM

SUSE Conversations

The Opportunity in OpenStack Cloud for Service Providers

Authored by Mike Kerr, Director, North America Channels & Alliances   Helping Your Clients Embrace the Cloud Can Reap Big Dividends Digital transformation is affecting every industry, from manufacturing to hospitality and government to finance. As a service provider, you’ve probably seen how this period of rapid change is disrupting your customers—causing both stress and […]

The post The Opportunity in OpenStack Cloud for Service Providers appeared first on SUSE Communities.

by agreis at November 19, 2018 03:23 PM

Trinh Nguyen

Searchlight weekly report - Stein R-21


Everybody was quite busy with the OpenStack Berlin Summit this week so nothing big happened. There're only a few things worth mentioned:

  • Looks like the 2 core reviews lei-zh and Kevin_Zheng don't have time for Searchlight anymore
  • Fortunately, sapd1 agreed to contribute to Searchlight project more frequently, at least for the Stein cycle
  • To welcome new contributors, we changed the meeting time [1]. And we had a meeting today with sapd1 and me [2]
  • We agreed to make a use case for Searchlight with external resources such as K8S [2]
  • sapd1 want to make a plugin for Octavia [2]
  • We also agreed to move the search bar of Searchlight to the top of the Horizon dashboard [3]

I'm waiting for a new area of Searchlight!!! \m/\m/\m/

References:

[1] https://review.openstack.org/#/c/618663/
[2] http://eavesdrop.openstack.org/meetings/openstack_search/2018/openstack_search.2018-11-19-13.51.log.html
[3] https://storyboard.openstack.org/#!/story/2004377

by Trinh Nguyen (noreply@blogger.com) at November 19, 2018 03:05 PM

Chris Dent

OpenStack Berlin Summit 2018

Last week I attended the OpenStack summit in Berlin. Of the several summits I've attended, this one was the most laid back. I imagine other people had different experiences, but my experience involved no surprises, no scandals, I neither got yelled at nor yelled at someone, and at least so far, I'm not ill. Success!

Here are some notes I took down while waiting for my plane home. For the most part I found I spent most of my time listening: Other people said what I would have, so there was little need to butt in.

OpenStack, as a product, is healthy. Lots of organizations are using it successfully for many purposes and exploring using it for more. The concerns that people have are with getting it to work well, not getting it to work at all. Upgrades and complexity remain issues, but overall the system is becoming more manageable.

On the other hand, OpenStack as a project, is struggling in some ways. Many of the subprojects have fewer regular contributors than they would prefer and several regular contributors expressed concern they were going to be pulled onto other activities. This is even true for some of the larger and older projects like Nova. There were at least two forum sessions that touched on the need to enable more casual contribution, but these felt, to me, to be too oriented towards creating regular contributors, rather that changing things so that regular, high-attention, contribution was not the norm. Making that transition is necessary as OpenStack enters maturity but it will shake up traditional power structures a great deal.

Or we need to convince the contributing companies that make billions off open infrastructure to up their game.

OpenStack, by number of commits over a year, is still one of the largest open source projects. Something is working.

Speaking of maturity: OpenStack is now so mature the next summit will be called the Open Infrastructure Summit. This allows the event to more clearly claim to cover all the additional (and less mature) areas the Foundation encompasses.

At the board and join leadership meeting the Foundation was granted the power to move some of the new areas from a pilot phase to whatever is next. As part of that, Zuul, Kata Airship, and StarlingX have recently formalized or documented their governance structures. Notably, Zuul has avoided over-specifying things, preferring instead to let the participants drive. StarlingX, in contrast, has made some effort to ensure that some of the perceived weaknesses present in the OpenStack Technical Committee (e.g., few levers to pull with regard to technical direction) are addressed.

The recently merged Vision for OpenStack Clouds was well-received at the leadership meeting, in the sense of "it's a good idea to do this". I suspect that it will help to expose (and hopefully resolve) some of the disconnects between the strands of OpenStack which are targeting a "cloud operating system" and those targeting "infrastructure as a service". Both are valid.

One of those disconnects is the continuing need for enhanced platform awareness driven by the NFV community. EPA support creates a great deal of complexity in the OpenStack system and I remain not-fully-convinced that the relatively small performance gains are worth the years of labor to support such things in projects like nova and placement.

From discussion at the summit, it's pretty clear that the trajectory of placement is going to require some refactoring to deal with scale demands. People are talking about using single placement services for multiple clouds. There's still a lot of fairly normal web-performance related tweaking that can be done to placement to make it more zippy, so no crisis here, just work.

The venue (CityCube) was reasonable. We fit. The coffee didn't run out. I was worried that its distance from everywhere would be an issue, but the public transport in Berlin is great. On the negative side: The lunch food was dreadful.

Starting with the next summit (in Denver), the event will be immediately followed by the PTG. That will be exhausting. It will also be unfortunate to not have face to face time with collaborators more often. It would be interesting to set up more localised and grassroots hackathons. Time together not necessarily to make big plans, but rather to simply do some work together with coffee, food and a bed nearby. Who's in?

While you think about that, here's a video of a new friend I met at the Berlin Zoo the day after the summit.

View this post on Instagram

New friend

A post shared by Christopher Dent (@anticdent) on

by Chris Dent at November 19, 2018 02:30 PM

November 15, 2018

Aptira

A Day in the Life of OpenKilda: Jon Vestal, Telstra

What is OpenKilda? Telstra’s VP of Product Architecture, Jon Vestal, discussed this in the Automation Forum at Layer123’s SDN NFV World Congress 2018 in Den Hague.

Jon presents Telstra’s dynamic global SDN network and what problems they solved with OpenKilda. He provides great insight into what sets OpenKilda apart from the other open source SDN Controllers available today.

Based in Singapore, Jon Vestal is a veteran of the telecoms industry with over 20 years of IT and telecommunications experience from engineering and operations to product development and sales. In his current role as head of product architecture, Jon overseas the team responsible for the enhancement of the company’s core network products as well as the design and delivery of strategic product initiatives including cloud computing.

OpenKilda is the SDN Controller for Global Networks. Designed to solve the problem of implementing a distributed SDN control plane with a network that spans the Globe. OpenKilda solves the problem of latency while providing a scalable SDN control & data-plane and end-to-end flow telemetry.

In case you missed it, our Solutionauts Simon Sellar and Dr Farzaneh Pakzad have recently written a series of articles on OpenKilda:

Remove the complexity of networking at scale.
Learn more about our SDN & NFV solutions.

Learn More

The post A Day in the Life of OpenKilda: Jon Vestal, Telstra appeared first on Aptira.

by Aptira at November 15, 2018 12:54 AM

November 14, 2018

OpenStack Superuser

And the Superuser Award goes to…

Berlin — The OpenStack community and Superuser editorial advisors have weighed in on the finalists and chosen City Network  at the OpenStack Berlin Summit Superuser Awards, sponsored by Zenko.

City Network’s research and development, professional services, education and engineering teams were one of five nominees for the Superuser Awards.

Previous winners, the Ontario Institute of Cancer Research (OICR), presented the award on stage during Wednesday’s keynotes.

Here are some details on the scale of their private cloud: “We run our public OpenStack based cloud in eight regions across three continents. All of our data centers are interconnected via private networks. In addition to our public cloud, we provide a pan-European cloud for verticals where regulatory compliance is paramount (e.g. banking and financial services, government, healthcare) addressing all regulatory challenges. Over 2,000 users of our infrastructure-as-a-service solutions run over 25,000 cores in production.”

Nominees for this round of awards included AdForm, Cloud&Heat, Linaro and ScaleUp Technologies.

The Superuser Awards launched in 2014 to recognize organizations that have used OpenStack to meaningfully improve their business while contributing back to the community. Previous winners include AT&T, CERN, China Mobile, Comcast, NTT Group, Paddy Power Betfair and UKCloud.

Stay tuned for the next cycle of nominations as the Superuser Awards head to the Denver Open Infrastructure Summit!

Cover photo // CC BY NC

The post And the Superuser Award goes to… appeared first on Superuser.

by Superuser at November 14, 2018 09:42 AM

Stefano Maffulli

OpenStack releases its most important artifact

The strongest legacy of OpenStack Foundation is showing how to do open source at scale, with millions of dollar budget. Going well beyond the garages, the university labs, the funded startups and the small non-profits of the years before. OpenStack its practice on top of past experiences like the Apache way and the Ubuntu community habits. On top of those, the original teams at Rackspace and NASA built a solid fund-raising campaign, business development and marketing never seen before for an open source project.

At the core of all that effort stood four strong principles, today published in a book embedding the practice of open source. These are what makes OpenStack different and as of today, it’s the only place where open source projects get help with their open collaboration practice, beyond IP and events management.

OpenStack was started with the belief that a community of equals, working together in an open collaboration, would produce better software, more aligned to the needs of its users and more largely adopted. It was therefore started from day 0 as an open collaboration model that includes as many individuals and organizations as possible, on a level playing field, with everyone invited to design open infrastructure software.

It was from these conditions that “The Four Opens” were born:

  • Open source

  • Open Design

  • Open Development

  • Open Community

Read the book Four Opens just released by the OpenStack Foundation.

by smaffulli at November 14, 2018 08:46 AM

OpenStack Superuser

How code merged with fabric for a tribute to the OpenStack community

No one writes code in pencil, but Kendall Nelson thought a .yaml file would make a fetching pencil skirt.

It’s her tribute to the OpenStack community, offering a snapshot of the time she merged her inaugural patch long before joining the OpenStack Foundation as an upstream developer advocate. If you’re a project team lead (PTL) or were involved in one of the core projects then — August 15, 2015 — your work might be woven into the pattern.

Nelson also runs the Community Contributor Awards intended to honor the unsung heroes of the community with honorifics like the “Don’t Stop Believin'” cup.  She debuted the skirt onstage at the Berlin Summit while handing out the most recent edition of the awards.

The code on the skirt isn’t actually the code from her first patch: it’s the project.yaml file from the day her first patch got merged. The file shows all of the projects, the PTLs and the missions of the projects.

“I wanted that code instead of mine because it showed an overview of what the community was like at the time,” she tells Superuser. “And you can look at it now and see how much it has changed. How much we’ve evolved as a community in only a few years.”

The custom design is the handiwork of Shenova Fashion, which specializes in STEM-inspired garments. (Think Fibonacci-sequence sheaths and Mandelbrot fractal weekender bags.) In 2012, the designer and owner Holly Renee noticed a lack of fun, sophisticated pieces that expressed her love for science — so she decided to make them. Her first design? A spectacular Neuroscience Retina Dress.

Nelson says she has her sights on rolling out a less truncated version — a dress — with more community code.

“Contributing to OpenStack has been the most rewarding and fun job I’ve ever had,” Nelson says. “From the day I got my first patch merged till now has been an adventure that I want to share with as many people as I can. ”

 

The post How code merged with fabric for a tribute to the OpenStack community appeared first on Superuser.

by Nicole Martinelli at November 14, 2018 08:16 AM

OpenStack Community Contributor Awards: Berlin Summit edition

BERLIN — The Community Contributor Awards gave a special tip of the hat to those who might not be aware that they are valued from the keynote stage.

These awards are a little informal and quirky but still recognize the extremely valuable work that everyone does to make OpenStack excel. These behind-the-scenes heroes were nominated by other community members.

There were three main categories: those who might not be aware that they are valued, those who are the active glue that binds the community together and those who share their knowledge with others.

OSF’s upstream developer advocate Kendall Nelson runs the program and handed out the honors. It wasn’t the only way she celebrated the community: she handed out medals wearing skirt patterned with community code.

Here are the winners from the Berlin Summit edition, in the words of the folks who nominated them:

The Giving Tree – Melanie Witt

Melanie Witt has taken on the role of Nova project team lead, one of the largest OpenStack projects. She’s done an amazing job running the project, constantly listening to operators and doing a great job despite all the work involved. She’s good at staying impartial and constantly balancing the competing demands and voices around her. She has also made great efforts to better organize the focuses of the strained core team by pushing them to do review runways.

Bonsai Caretaker – Swapnil Kulkarni

He’s been maintaining the Jetbrains PyCharm licenses, used by many community contributors, for the OpenStack community for a long time now. It has been recognized by Jetbrains as well when they did an interview with him for his contributions to the community. His contributions often go unnoticed even if most of us are using it daily. Here’s one more testimonial about it.

The Duct Tape Medal – Eric Fried

Eric Fried has been a recent huge force in Nova and many other projects in OpenStack, reviewing and contributing code at an astonishing rate and with a very high level of quality. He takes on and doggedly sticks with tasks which exhaust others and remembers to address any side issues that crop up.
He’s worked on everything from PowerVM support in Nova to helping get the placement service off the ground. It doesn’t matter what he picks up, he gets it working.

The ‘Does Anyone Actually Use This?’ Trophy – Lingxian Kong

  • He created and lead Qinling project to fulfill the serverless area in OpenStack ecosystem.
  • Successfully made Qinling an OpenStack official project and made the first release for Rocky.
  • Work closely with other OpenStack projects to make Qinling easy to used.
  • Also actively contributed to Barbican and Octavia, responsible for them to be deployed in Catalyst Cloud (an OpenStack- based public cloud in New Zealand.)
  • Implemented the Octavia golang support and Ansible module.
  • Found time to act as a mentor in the Outreachy internship program.

Hero of the People – Rico Lin

As the PTL of Heat, Rico Lin has been very helpful with managing orchestration service and its relative integration. He has also put a lot of effort on binding developers, users and operators for community build up. He’s also been active in community discussions around topics including automatic upgrades, self healing and Kubernetes.He’s always a friendly face for anyone who joins this community and ready to share, bring more guidance or help to who like to learn more.

Don’t Stop Believin’ Cup – Fatema Khaled

Fearless Fatema Khaled has come storming onto the OpenStack scene like a sponge with jet propulsion and a bravery shield. She came to her Outreachy internship ready to seek out and soak up everything she could about OpenStack. When you were interning, did you fearlessly approach development teams to find out how you could use your skills to help? Fatema did! After being a significant part of the Storyboard development, she jumped on a plane to Denver to find out what her next move in the OpenStack community should be. She went room to room, introducing herself and learning about projects, and decided to make a Swift contribution to start off. Who starts with Swift?! She did! Although she’s brand new in our community, her fearless attitude is one to be celebrated and I’m sure she’ll go on to make significant contributions in this community!

Open Infrastructure Shield – Frank Kloeker

He has been doing I18n team PTL for three OpenStack cycles, which is so long and I really appreciate his leadership in I18n team. Thanks to him, I18n team has a good balance between Asian countries and other countries. Translation effort might be biased to limited countries and previous PTLs were from China, Japan, and Korea, which might implies that the country of I18n team activity might be biased. Current good balance might not be possible without his energetic contribution and leadership on I18n team.
He has spent his personal time a lot on OpenStack contribution and evangelism (upstream and downstream) not only through I18n effort but also to his company & country. For I18n effort, this year, he went to OpenStack Days near to Germany where he lives. [1] is one example of how he shared I18n activities to other European countries. [2-4] are how he shared what he learned through OpenStack contribution to his company for open source ecosystem in his company. Finally, as far as I know, he’s one of main organizers of upcoming Berlin Summit and he spends a lot on his personal time for better organization of the Summit.

Keys to Stack City – Andreas Jaeger

  • He does it all. Writing docs, fixing CI jobs, coordinating large OpenStack-wide CI changes, helping users with Zuul, managing review queues despite the incredibly volume we face. It is easy to underestimate how much work Andreas is doing because he does it quietly and happily. Then Andreas goes on vacation for a few weeks and we wonder why things stop moving so smoothly.
  • His contributions to OpenStack span many years, many projects, and many facets of work. Without Andreas we would likely need multiple individuals to pick up the work.

Mentor of Mentors – Victoria Martinez de la Cruz

  • She’s been a crucial person in on-boarding new contributors through the Outreachy Internship program as the coordinator for the OpenStack community. I don’t have data behind this, but between being a mentor and coordinating dinners, she has created new contributors through this program.
  • She just stepped down this cycle as the coordinator and I think that she deserves recognition for how much she has not only impacted that process, but also how many new contributors she has onboarded into the community.

Stay tuned to Superuser for info on how to nominate your heroes for the next Summit!

The post OpenStack Community Contributor Awards: Berlin Summit edition appeared first on Superuser.

by Superuser at November 14, 2018 08:15 AM

November 13, 2018

StackHPC Team Blog

Zero-Touch Provisioning using Ironic Inspector and Dell iDRAC

How long does it take a team to bring up new hardware for private cloud? Despite long days at the data centre, why does it always seem to take longer than initial expectations? The commissioning of new hardware is tedious, with much unnecessary operator toil.

The scientific computing sector is already well served by tools for streamlining this process. The commercial product Bright Cluster Manager and open-source project xCAT (originally from IBM) are good examples. The OpenStack ecosystem can learn a lot from the approaches taken by these packages, and some of the gaps in what OpenStack Ironic can do have been painfully inconvenient when using projects such as TripleO and Bifrost at scale.

This post covers how this landscape is changing. Using new capabilities in OpenStack's Ironic Inspector, and new support for manipulating network switches using Ansible, we can build a system for zero-touch provisioning using Ironic and Ansible together to bring zero-touch provisioning to OpenStack private clouds.

Provision This...

Recently we have been working on a performance prototyping platform for the SKA telescope. In a nutshell, this project aims to identify promising technologies to pick up and run with as the SKA development ramps up over the next few years. The system features a number of compute nodes with more exotic hardware configurations for the SKA scientists to explore.

This system uses Dell R630 compute nodes, running as bare metal, managed using OpenStack Ironic, with an OpenStack control plane deployed using Kolla.

The system has a number of networks that must be managed effectively by OpenStack, without incurring any performance overhead. All nodes have rich network connectivity - something which has been a problem for Ironic, and which we are also working on.

Physical networks in the deployment
  • Power Management. Ironic requires access to the compute server baseboard management controllers (BMCs). This enables Ironic to power nodes on and off, access serial consoles and reconfigure BIOS and RAID settings.
  • Provisioning and Control. When a bare metal compute node is being provisioned, Ironic uses this network interface to network-boot the compute node, and transfer the instance software image to the compute node's disk. When a compute node has been deployed and is active, this network is configured as the primary network for external access to the instance.
  • High Speed Ethernet. This network will be used for modelling the high-bandwidth data feeds being delivered from the telescope's Central Signal Processor (CSP). Some Ethernet-centric storage technologies will also use this network.
  • High Speed Infiniband. This network will be reserved for low-latency, high-bandwidth messaging, either between tightly-coupled compute or compute that is tightly coupled to storage.

Automagic Provisioning Using xCAT

Before we dive in to the OpenStack details, lets make a quick detour with an overview of how xCAT performs what it calls "Automagic provisioning".

  • This technique only works if your hardware attempts a PXE boot in its factory default configuration. If it doesn't, well that's unfortunate!
  • We start with the server hardware racked and cabled up to the provisioning network switches. The servers don't need configuring - that is automatically done later.
  • The provisioning network must be configured with management access and SNMP read access enabled. The required VLAN state must be configured on the server access ports. The VLAN has to be isolated for the exclusive use of xCAT provisioning.
  • xCAT is configured with addresses and credentials for SNMP access to the provisioning network switches. A physical network topology must also be defined in xCAT, which associates switches and ports with connected servers. At this point, this is all xCAT knows about a server: that it is an object attached to a given network port.
  • The server is powered on (OK, this is manual; perhaps "zero-touch" is an exaggeration...), and performs a PXE boot. For a DHCP request from an unidentified MAC, xCAT will provide a generic introspection image for PXE-boot.
  • xCAT uses SNMP to trace the request to a switch and network port. If this network port is in xCAT's database, the server object associated with the port is populated with introspection details (such as the MAC address).
  • At this stage, the server awaits further instructions. Commissioning new hardware may involve firmware upgrades, in-band BIOS configuration, BMC credentials, etc. These are performed using site-specific operations at this point.

We have had some experience of various different approaches - Bright Cluster Manager, xCAT, TripleO and OpenStack Ironic. This gives us a pretty good idea of what is possible, and the benefits and weaknesses of each. As an example, the xCAT flow offers many advantages over a manual approach to hardware commissioning - once it is set up. Some cloud-centric infrastructure management techniques can be applied to simplify that process.

An Ironic Inspector Calls

Here's how we put together a system, built around Ironic, for streamlined infrastructure commissioning using OpenStack tools. We've collected together our Ansible playbooks and supporting scripts as part of our new Kolla-based OpenStack deployment project, Kayobe.

One principal difference with the xCAT workflow is that Ironic's Inspector does not make modifications to server state, and by default does not keep the introspection ramdisk active after introspection or enable an SSH login environment. This prevents us from using xCAT's technique of invoking custom commands to perform site-specific commissioning actions. We'll cover how those actions are achieved below.

  • We use Ansible network modules to configure the management switches for the Provisioning and Control network. In this case, they are Dell S6010-ON network-booting switches, and we have configured Ironic Inspector's dnsmasq server to boot them with Dell/Force10 OS9. We use the Dellos9 Ansible module
  • Using tabulated YAML data mapping switch ports to compute hosts, Ansible configures the switches with port descriptions and membership of the provisioning VLAN for all the compute node access ports. Some other basic configuration is applied to set the switches up for operation, such as enabling LLDP and configuring trunk links.
  • Ironic Inspector's dnsmasq service is configured to boot unrecognised MAC addresses (for servers not in the Ironic inventory) to perform introspection on those servers.
  • The LLDP datagrams from the switch are received during introspection, including the switch port description label we assigned using Ansible.
  • The introspection data gathered from those nodes is used to populate the Ironic inventory to register the new node using Inspector's discovery capabilities.
  • With Ironic Inspector's rule-based transformations, we can populate the server's state in Ironic with BMC credentials, deployment image IDs and other site-specific information. We name the nodes using a rule that extracts the switch port description received via LLDP.

So far, so good, but there's a catch...

Dell OS9 and LLDP

Early development was done on other switches, including simpler beasts running Dell Network OS6. It appears that Dell Network OS9 does not yet support some simple-but-critical features we had taken for granted for the cross-referencing of switch ports with servers. Specifically, Dell Network OS9 does not support transmitting LLDP's port description TLV that we were assigning using Ansible network modules.

To work around this we decided to fall back to the same method used in xCAT: we match using switch address and port ID. To do this, we create Inspector rules for matching each port and performing the appropriate assignment. And with that, the show rolls on.

Update: Since the publication of this article, newer versions of Dell Networking OS9 have gained the capability of advertising port descriptions, making this workaround unnecessary. For S6010-ON switches, this is available since version 9.11(2.0P1) using the advertise interface-port-desc description LLDP configuration.

Ironic's Catch-22

Dell's defaults for the iDRAC BMC assign a default IP address and credentials. All BMC ports on the Power Management network begin with the same IP. IPMI is initially disabled, allowing access only through the WSMAN protocol used by the idracadm client. In order for Ironic to manage these nodes, their BMCs each need a unique IP address on the Power Management subnet.

Ironic Inspection is designed to be a read-only process. While the IPA ramdisk can discover the IP address of a server's BMC, there is currently no mechanism for setting the IP address of a newly discovered node's BMC.

Our solution involves more use of the Ansible network modules.

  • Before inspection takes place we traverse our port-mapping YAML tables, putting the network port of each new server's BMC in turn into a dedicated commissioning VLAN.
  • Within the commissioning VLAN, the default IP address can be addressed in isolation. We connect to the BMC via idracadm, assign it the required IP address, and enable IPMI.
  • The network port for this BMC is reverted to the Power Management VLAN.

At this point the nodes are ready to be inspected. The BMCs' new IP addresses will be discovered by Inspector and used to populate the nodes' driver info fields in Ironic.

Automate all the Switches

The configuration of a network, when applied manually, can quickly become complex and poorly understood. Even simple CLI-based automation like that used by the dellos Ansible network modules can help to grow confidence in making changes to a system, without the complexity of an SDN controller.

Some modern switches such as the Dell S6010-ON support network booting an Operating System image. Kayobe's dell-switch-bmp role configures a network boot environment for these switches in a Kolla-ansible managed Bifrost container.

Once booted, these switches need to be configured. We developed the simple dell-switch role to configure the required global and per-interface options.

Switch configuration is codified as Ansible host variables (host_vars) for each switch. The following is an excerpt from one of our switch's host variables files:

# Host/IP on which to access the switch via SSH.
ansible_host: <switch IP>

# Interface configuration.
switch_interface_config:
  Te1/1/1:
    description: compute-1
    config: "{{ switch_interface_config_all }}"
  Te1/1/2:
    description: compute-2
    config: "{{ switch_interface_config_all }}"

As described previously, the interface description provides the necessary mapping from interface name to compute host. We reference the switch_interface_config_all variable which is kept in an Ansible group variables (group_vars) file to keep things DRY. The following snippet is taken from such a file:

# User to access the switch via SSH.
ansible_user: <username>

# Password to access the switch via SSH.
ansible_ssh_pass: <password>

# Interface configuration for interfaces with controllers or compute
# nodes attached.
switch_interface_config_all:
  - "no shutdown"
  - "switchport"
  - "protocol lldp"
  - " advertise dot3-tlv max-frame-size"
  - " advertise management-tlv management-address system-description system-name"
  - " advertise interface-port-desc"
  - " no disable"
  - " exit"

Interfaces attached to compute hosts are enabled as switchports and have several LLDP TLVs enabled to support inspection.

We wrap this up in a playbook and make it user-friendly through our CLI as the command kayobe physical network configure.

iDRAC Commissioning

The idrac-bootstrap.yml playbook used to commission the box-fresh iDRACs required some relatively complex task sequencing across multiple hosts using multiple plays and roles.

A key piece of the puzzle involves the use of an Ansible task file included multiple times using a with_dict loop, in a play targeted at the switches using serial: 1. This allows us to execute a set of tasks for each BMC in turn. A simplified example of this is shown in the playbook below:

- name: Execute multiple tasks for each interface on each switch serially
  hosts: switches
  serial: 1
  tasks:
    - name: Execute multiple tasks for an interface
      include: task-file.yml
      with_dict: "{{ switch_interface_config }}"

Here we reference the switch_interface_config variable seen previously. task-file.yml might look something like this:

- name: Display the name of the interface
  debug:
    var: item.key

- name: Display the description of the interface
  debug:
    var: item.value.description

This commissioning technique is clearly not perfect, having an execution time that scales linearly with the number of servers being commissioned. That said, it automated a labour intensive manual task on the critical path of our deployment in a relatively short space of time - about 20 seconds per node.

We think there is room for a solution that is more integrated with Ironic Inspector and would like to return to the problem before our next deployment.

Introspection Rules

Ironic Inspector's introspection rules API provides a flexible mechanism for processing the data returned from the introspection ramdisk that does not require any server-side code changes.

There is currently no upstream Ansible module for automating the creation of these rules. We developed the ironic-inspector-rules role to fill the gap and continue boldly into the land of Infrastructure-as-code. At the core of this role is the os_ironic_inspector_rule module which follows the patterns of the upstream os_* modules and provides us with an Ansible-compatible interface to the introspection rules API. The role ensures required python dependencies are installed and allows configuration of multiple rules.

With this role in place, we can define our required introspection rules as Ansible variables. For example, here is a rule definition used to update the BMC credentials of a newly discovered node:

# Ironic inspector rule to set IPMI credentials.
inspector_rule_ipmi_credentials:
  description: "Set IPMI driver_info if no credentials"
  conditions:
    - field: "node://driver_info.ipmi_username"
      op: "is-empty"
    - field: "node://driver_info.ipmi_password"
      op: "is-empty"
  actions:
    - action: "set-attribute"
      path: "driver_info/ipmi_username"
      value: "{{ inspector_rule_var_ipmi_username }}"
    - action: "set-attribute"
      path: "driver_info/ipmi_password"
      value: "{{ inspector_rule_var_ipmi_password }}"

By adding a layer of indirection to the credentials, we can provide a rule template that is reusable between different systems. The username and password are then configured separately:

# IPMI username referenced by inspector rule.
inspector_rule_var_ipmi_username:

# IPMI password referenced by inspector rule.
inspector_rule_var_ipmi_password:

This pattern is commonly used in Ansible as it allows granular customisation without the need to redefine the entirety of a complex variable.

Share and Enjoy

Bringing it all together, our deployment uses Ironic, Ansible and friends to boot and configure Dell network switches, and then in turn boot, commission and configure Dell compute servers.

This deployment demonstrates using OpenStack to deliver zero-touch provisioning. Everything about the deployment infrastructure is defined in code. What's more, by using OpenStack our zero-touch provisioning will develop at the pace of cloud technology. We believe this project will rapidly surpass what is possible using conventional cluster management techniques.

Alaska control and compute nodes

Everything described here is available as part of Kayobe, our new open source project for automating inventory management within a Bifrost and Kolla environment. Where any of the roles provided by Kayobe may have wider appeal, we will consider making them available on Ansible Galaxy for others to use.

We have grand ambitions for Kayobe, and we hope to be speaking more about the capabilities the project is developing in due course.

Acknowledgements

  • With thanks to Matt Raso-Barnett from Cambridge University for a detailed overview of how xCAT zero-touch provisioning was used for deploying the compute resource for the Gaia project.

by Mark Goddard at November 13, 2018 05:00 PM

OpenStack Superuser

From OpenStack to open infrastructure: A project governance update

Last year, at the OpenStack Summit in Sydney, the OpenStack Foundation (OSF) unveiled a new strategy. Realizing the variety of open infrastructure use cases and recognizing that integration is the largest barrier to its further adoption, we shifted our focus from being solely about the production of the OpenStack software, to more broadly helping organizations embrace open infrastructure: using and combining open source solutions to fill their needs in terms of IT infrastructure. This strategy involves finding common use cases, collaborating across communities, supporting the creation of missing technology pieces and testing everything end to end.

Since open infrastructure use cases are pretty broad, we defined four strategic focus areas that the OSF should help with: Datacenter cloud, which is the traditional OpenStack use case, container infrastructure, edge computing and CI/CD. Beyond OpenStack (which helps directly or indirectly with all those areas), we supported the emergence of new open collaborations around missing technology pieces helping in those strategic focus areas.

In December 2017, we launched Kata Containers, our first pilot project. In May 2018, we established Zuul, the engine behind our project infrastructure CI systems, as a standalone project. More recently, we started piloting two additional projects: Airship and StarlingX. All those pilot projects were set up with their own independent branding and separate technical governance.

These four pilot projects allowed us to learn a lot and over the past five months we have worked with the OSF Board and the community to put more structure around the future of our work with additional open infrastructure projects. The proposed structure was presented and refined at Board meetings, community meetings, on mailing lists and at in-person events over the past months.  It was finally approved during the Board meeting held at the OpenStack Summit in Berlin.

The Four Opens

The Four Opens were created in 2010, as the base principles to sustain the creation and growth of the OpenStack project. We believe that in order to create a successful piece of infrastructure technology, open source is not enough: you need open collaboration across all stakeholders on a level playing field to reach the levels of contribution and adoption necessary to success.

The Four Opens are:

  • Open source: Everything is licensed under an OSI-compliant license, we do not support the open core model
  • Open design: Design does not happen behind closed doors between a selected few and involves users directly
  • Open development: Everything happening in development is transparent and accessible
  • Open community: Anyone can join and be elected to project leadership positions, no appointed seats

Over the past eight years, the Four Opens have proved a great model to enable this open collaboration mindset and their application has expanded well beyond the realm of software development.  They’re now the OSF way: An integral part of how we do things at the Foundation and within the open source projects that we support. In order to better document the OSF way, we’ve recently started an effort to write a community book on the Four Opens. More on that in this post by Chris Hoge.

Strategic focus areas (SFA)

Open infrastructure is an ever-evolving construct, changing as the computing needs of organizations evolve with the technology. As mentioned above, SFAs are driven by usage scenarios and help direct the OSF action to specific segments of the open infrastructure market.

As such, defining SFAs will be part of the regular strategic planning activities of the OSF Board of Directors. Focus areas may be proposed, discussed, approved or abandoned during strategic reviews. Those strategic reviews should happen as-needed, but at least annually.

Projects

To build open infrastructure solutions, we look forward to integrating existing open source projects produced by adjacent open source communities. However some gaps may persist and extra technology pieces may be necessary to fill the goals of our strategic focus areas. The OSF enables those projects to be successfully set up as open collaborations, by providing IP management, a set of base collaboration rules (the Four Opens) and upstream and downstream community support services. Projects are not tied to a specific SFA: they should ideally help with multiple SFAs. The idea here is not to bring on dozens of projects, but to curate a set of strong projects that help solve real-world problems for our users.

All projects supported by the OSF follow the Four Opens. Open source projects that do not wish to be set up as open collaborations can still be included and integrated in open infrastructure solutions. However, they’ll see little benefit in being hosted by the OSF. Projects may be composed of several components or deliverables, as long as they are under a common defined scope and governance.

Projects go through two stages, which define the level of engagement of OSF resources:

Pilot projects

As part of meeting the goals of our strategic focus areas, the OSF staff will encourage the formation of new openly developed projects. It will also evaluate and engage with existing projects that fill gaps in open infrastructure adoption and could be interested in being set up as an open collaboration following the Four Opens.

The OSF staff (as represented by its officers) can create or select pilot projects where it identifies such promising new technology pieces and a potential for open collaboration. It then helps the pilot projects by establishing initial branding, setting up project websites, guiding them to set up proper governance, and understanding and adopting the Four Opens.

Confirmed projects

This initial bootstrapping phase is completed once the project operates under an open community governance and has produced at least one release under that model. At that point, pilot projects shall be reviewed by the Board, in order to decide on further investment of OSF resources to support it over the long-term.

With the input of other OSF leadership bodies, the Board will review the strategic alignment of the project with the Foundation’s SFAs, but also their progress in setting up an effective open collaboration following the Four Opens. It may then decide to confirm the project as a long-term OSF investment, abandon the pilot or defer its decision.

Practical implications

The first practical implication of this OSF project governance structure is the need to adapt the OSF bylaws. In order to allow the OSF to support open infrastructure projects beyond just OpenStack, the Foundation bylaws needed a number of additions. Those changes were approved by the Board during the meeting held at the OpenStack Summit in Berlin and will appear in the ballot for individual members to vote on during the January Foundation election cycle. You can read this post by Jonathan Bryce to learn more about it.

Another practical implication is around our project infrastructure: the massive, openly operated set of open-source services that our community infrastructure team runs to support the development of our software. While it always supported more than just OpenStack, it was always called the “OpenStack project infrastructure” and used the OpenStack name throughout. In order to more easily reuse it for other projects and promote its open development model beyond OpenStack in the future, a rebranding was in order. The current plan is to call it OpenDev and over the coming year to gradually move it to the opendev.org domain. You can read this post by Clark Boylan to learn more.

There will be a lot of other changes over the coming year as we put this model into place. If you have questions or comments, please feel free to reach out on the Foundation mailing-list, or in person at events.

If you’re at the OpenStack Summit in Berlin, the Foundation leadership team will hold a Ask Me Anything session at 3:20 p.m. on Wednesday. Join us!

The post From OpenStack to open infrastructure: A project governance update appeared first on Superuser.

by Thierry Carrez at November 13, 2018 08:20 AM

Objectif Libre

Objectif Libre announces the release of its new automation tool for OpenStack: “Build & Lifecycle Manager”.

Taking the opportunity of the OpenStack Summit which opens today in Berlin, Objectif Libre is proud to announce the official release of its OpenStack cloud automation solution called “Build & Lifecycle Manager. This tool, based on Ansible, aims at simplifying the deployment and updates of OpenStack clouds – community version.

Customizable for each customer environment, this tool is entirely born from the expertise of the Objectif Libre’s consultants, and is maintained according to OpenStack versions. Hence, customers can benefit from updates and functional upgrades from the company’s OpenStack experts.

Through an annual subscription, this offer includes:

  • The provisionning of the tool on customers’ repositories to deploy OpenStack Cloud, in a version following the community releases biannual cycle.
    The tool, based on Ansible, automatically deploys the main APIs of OpenStack: the 8 core APIs as well as more than 10 other of the most interesting, including Ironic, Designate, Manila, Octavia…
  • The OpenStack version upgrades, allowing the customer to make up to 2 OpenStack upgrades per year, fully automated and tested;
  • The access to the new features developed by our R&D teams: the addition of new APIs, the integration of new functionalities, the updates of installation and operating documentation…

The Build & Lifecycle Management is based on an Ansible Playbook entirely developed by our team.

The advantages over existing solutions (community playbook or other automation solutions like Puppet):

  • it is perfectly mastered by the R&D teams of Objectif Libre;
  • it is not necessary to install a specific tool beforehand, since it can be launched like an Ansible Playbook;
  • based on Ansible, it is simple and easy to understand and to handle;
  • customers can modify it (adding APIs, packages, etc.), these features being integrated to the tool and perfectly supported by our teams.

The deployment of a classic platform is done in less than 20 minutes and this tool allows to deploy complex platforms.

Among the features already under development, we can mention the deployment of OpenStack APIs into containers.

Objectif Libre’s historical customers in France, some of whom have even entrusted the exploitation of their OpenStack platform to Objectif Libre, already benefit from this offer.

L’article Objectif Libre announces the release of its new automation tool for OpenStack: “Build & Lifecycle Manager”. est apparu en premier sur Objectif Libre.

by Claire at November 13, 2018 06:00 AM

Trinh Nguyen

Searchlight weekly report - Stein R-22



Just one week before the Berlin Summit [1], we could finally review and merge some old patches:
  • Make search settings themeable and simpler [2]
  • Add Favorites ability for search queries [3]
  • Add cover job for searchlight [4]
  • Add Searchlight status upgrade check [5]
  • Remove i18n.enable_lazy() call from searchlight.cmd (this fixes a bug of [5]) [6]
  • Add cover job for python-searchlightclient [7]
  • Fix tox coverage test of python-searchlightclient ([7] depends on this) [8]
For the next couple weeks, we will continue working on developing the use cases for Searchlight.

Greatness is coming!!!!

References:

[1] https://www.openstack.org/summit/berlin-2018/
[2] https://review.openstack.org/#/c/367555/
[3] https://review.openstack.org/#/c/367124/
[4] https://review.openstack.org/#/c/616056/
[5] https://review.openstack.org/#/c/613789/
[6] https://review.openstack.org/#/c/615594/
[7] https://review.openstack.org/#/c/616058/
[8] https://review.openstack.org/#/c/616156/

by Trinh Nguyen (noreply@blogger.com) at November 13, 2018 02:49 AM

November 12, 2018

OpenStack Superuser

How to navigate the OpenStack Summit Berlin agenda

I love data. With the OpenStack Summit Berlin agenda going live this morning, I decided to take a look at some of the math behind November’s event. More than 100 sessions and workshops covering 35 open source projects over nine tracks—that’s a lot to cover in three days. It makes it even more challenging to build an onsite schedule while still providing yourself a chance to navigate the hallway track and collaborative Forum sessions—which will be added to the schedule in the upcoming weeks.

So who exactly can you collaborate with in Berlin? Represented by Summit speakers alone—there are 256 individuals from 193 companies and 45 countries that you may run into during the hallway track.

Before I start, I want to say a big thank you to the programming committee members who worked very hard creating the Summit schedule. It’s not an easy task—taking over 750 submissions from over 500 companies and turning into content that fits within 100 speaking slots.

Now, to take full advantage of the incredible talks that are planned for November, I wanted to share a few tips that I find helpful when putting my schedule together.

Start with the 101

Whether it’s your first Summit or you’re new to a project and want to get involved, there are a lot of sessions and workshops for you. You can either search for sessions that are tagged as 101 or you can filter the schedule for sessions marked as beginner. If there’s a particular project where you want to begin contributing, project on-boarding sessions will be added soon.

If this is your first Summit, I would recommend planning to attend some of the networking opportunities that are planned, including the opening night Open Infrastructure Marketplace Mixer.

Find the users

If there is anything I love more than data, it’s meeting new users and catching up with those I know. This makes the case study tag one of my most frequently used filters. If you are like me and enjoy learning how open infrastructure is being used in production, the Berlin Summit will not disappoint. From BMW sharing its CI/CD strategy with Zuul to Adobe Advertising Cloud sharing its OpenStack upgrade strategy, there are a lot of users sharing their open infrastructure use cases and strategies.

There are a few new case studies that have really caught my eye and have already landed on my personal schedule:

Filter by use case

Whether you’re interested in edge computing, CI/CD or artificial intelligence (AI), the Summit tracks provide a way to filter the sessions to find operators, developers and ecosystem companies pursuing that use case case.

Sessions are counted by track based on the number of submissions that are received during the call for presentations (CFP) process. For the Berlin Summit, here is the track breakdown by number of sessions:

Search the relevant open source projects

It was not a typo earlier when I mentioned that there are over 35 open source projects covered by the sessions at the Summit. Whether you’re trying to find one of the 45 Kubernetes sessions or a TensorFlow session on AI, the project-specific tags enable you to meet the developers and operators behind these projects.

Here are the top 10 open source projects and the number of sessions you can explore for each project:

Now, it’s time to start building your schedule. The official Summit mobile app will be available in the upcoming weeks, but you can still build your personal schedule in the web browser. Stay tuned on Superuser as we will feature top sessions by use case in the upcoming weeks and a few content themes spread across all nine tracks.

Photo // CC BY NC

The post How to navigate the OpenStack Summit Berlin agenda appeared first on Superuser.

by Allison Price at November 12, 2018 09:46 AM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
December 15, 2018 07:22 AM
All times are UTC.

Powered by:
Planet