February 22, 2017

OpenStack Superuser

How the Open Networking Foundation pioneers innovation

The Open Networking Foundation (ONF), founded in 2011, first broke new ground on the software-defined networking frontier.

Now, with over 200 members and a board that includes AT&T, Google, NTT Communications, SK Telecom and Verizon, they’re streamlining operations. At the recent Linux Foundation Open Source Leadership Summit, they launched the Open Innovation Pipeline, which marries efforts of the ONF and Open Networking Lab (ON.Lab). Merging the two organizations will take the better part of 2017, but that won’t halt progress.

Superuser talks to ONF’s chief development officer Bill Snow about how the telcos play together, the upshot of the Pipeline for XOS — a service management toolkit built on top of OpenStack — and how you can get involved.

What’s your role with the ONF?

I started the Open Networking lab (ON.Lab) with Guru Parulkar five years ago and our mission is to bring better networks for the public good. So, for five years we have been developing platforms and tools for doing that. The big one is ONOS, the controller. As we built out ONOS we did so by working with network operators on typical use cases they wanted. One of those was called CORD, and CORD started as a residential access, a way to bring agility and better economics to the engine and network for these optical connections and we did that working closely with AT&T…My current role is to run the engineering team at Networking lab that provides a core set of developers to both ONOS and to CORD.

What compelled you to get involved with the Open Networking Foundation?

I spent my career in networking and watched it get locked up. And without innovation, things kind of shrivel and die. That was fine as long as everything was just connected computers. But when we moved to the world of mobile and video became so huge, all of a sudden you had all these network operators. And there were no longer the economics for them to keep growing their network to keep up with these workloads. So it was a tremendous opportunity to help the whole ecosystem move to more fertile ground for innovation and to bring much better communications infrastructures. So that’s been my motivation.

How do these competing companies work together? How much refereeing is involved?

They play amazingly well together. In fact at the Open Networking Lab, we have as many partner engineers and operator engineers as we do lab engineers and, quite frankly, we kind of lose track of who works for whom. So for example you have SK Telecom, Verizon, AT&T working together hand-in-hand on the mobile version of CORD. And although they end up being competitors in some markets, they also have the same goal for making this infrastructure much more agile, much more cost efficient. So that gives them really some common ground to work hard together.

What kind of contributions are you looking for now from the outside?

We have so many solutions and so many activities. We have 17 partner organizations, but we also have over a 150 collaborating organizations. So there’s a lot of companies involved in the different use cases that we’re driving out for the operators. And what you saw in the announcement from the Innovation Pipeline was really us just articulating the process that we use to take all of this componentry, whether it’s disaggregated hardware, monetized hardware, or the open-source software pieces. Somebody has to bring those together into solutions for the operator.
This creates a new value chain and along the value chain different collaborators or partners bring in different pieces. And they work together to bring that whole solution either to a POC or to a lab trial or field trial, maybe even out into production. So everything we undertake starts with operator or a group of operators saying this is important to us. And then we build a ecosystem of developers around that through our community efforts, community outreach, and drive it as far as make sense for the operator, out towards production.

pipe

XOS is the project that most directly intersects with OpenStack, tell me a bit about it.

XOS provides a platform that can really bring VNFs together and create services. It’s all about service creation and it does so at the top layer from taking in a TOSCA or maybe even a Yang-type specification. Some way of specifying a service and so it has a data model that gets driven to do all the changes that are needed. Some of those changes are starting up virtual machines in OpenStack, some are getting virtual networks to interconnect these VNFs. So OpenStack has been a fundamental piece of the infrastructure…XOS is a layer that sees everything as a service and so through OpenStack and the network service or VNFs as a service … Now there’s a lot of interest and there’s a lot of deployments of OpenStack in the operator space, but there’s also instances where operators are not using OpenStack, right? So XOS has to also be able to utilize other infrastructure services, whether it’s Kubernetes or one of the other container ecosystems.

What will the Pipeline mean for XOS?

So the Pipeline provides points where you can do replacements. So in this case, getting virtual service can be done through OpenStack or it could be done through a different service. Same with controllers. We currently use ONOS but there’s no reason that we couldn’t use ODL or open controller or another controller. Whatever an operator wants to use. And then this is where we tie together the Open Network Foundation with the open source software because the intersection between these components and where you can replace them —that’s where you want to standardize some interfaces.

One of the analyst comments about the new Pipeline was that it won’t replace proprietary networking, but put more pressure on traditional networking vendors. If that’s the case, do you think it’s good or a bad?

So I think that there’s truly a need in the market place… If any of us want better network services, change has to happen, right? And of course change is painful for the incumbents who perhaps own most of the customers. And so bringing the innovation back in ultimately is painful for the incumbents, but these are very powerful corporations with a lot of smart people. And I wouldn’t count any of them out, right? They will all find ways to come with this, bring back their value and probably continue growing.

Anything else you want to make sure that OpenStack folks know about the Pipeline or about the ONF?

OpenStack is a really important part of the infrastructures that are in place. In the past,  maybe OpenStack networking wasn’t as comprehensive or as good as it could have been. There’s been some efforts around making it better. So I think making virtual networking easy as a part of OpenStack would be wonderful.

What’s the main way that Stackers can help with that?

Probably through the same type of thing that we do. Working with an operator or an enterprise to drive a use case that really is demanding on virtual networking and see what can be learned from that.

Get involved!

You can check out the GitHub repository  and if your company is already a member, sign up for access to the Wiki, working groups, mailing lists etc.  If you want your company or research organization to join, more here.

You’ll also find the ONF present through demos and workshops at the upcoming Mobile World Congress and Open Networking Summit. For more on workshops and conferences where they participate, check out their events page.

 

Cover Photo // CC BY NC

The post How the Open Networking Foundation pioneers innovation appeared first on OpenStack Superuser.

by Nicole Martinelli at February 22, 2017 01:22 PM

February 21, 2017

Cameron Seader

OpenStack Summit Boston 2017 Presentation Votes (ends Feb. 21st, 2017 at 11:59pm PST)

Open voting is available for all session submissions until Tuesday, Feb 21, 2017 at 11:59PM PST. This is a great way for the community to decide what they want to hear.

I have submitted a handful of sessions which I hope will be voted for. Below are some short summary's and links to their voting pages.

Avoid the storm! Tips on deploying the Enterprise Cloud
The primary driver for enterprise organizations choosing to deploy a private cloud is to enable on-demand access to the resources that the business needs to respond to market opportunities. But business agility requires availability... 
https://www.openstack.org/summit/austin-2016/vote-for-speakers/#/18317
Keys to Successful Data Center Modernization to Infrastructure Agility
Data center modernization and consolidation is the continuous optimization and enhancement of existing data center infrastructure, enabling better support for mission-critical and Mode 1 applications. The companion Key Initiative, "Infrastructure Agility" focuses on Mode 2...
https://www.openstack.org/summit/austin-2016/vote-for-speakers/#/18403
Best Practices with Cloud Native Microservices on OpenStack
It doesn't matter where your at with your implementation of Microservices, but you do need to understand some key fundamentals when it comes to designing and properly deploying on OpenStack. If your just starting out then you will need to learn some key things such as the common characteristics, monolithic vs  microservice, componetization, decentralized governance, to name a few. In this session you'll learn some of these basics and where to start...
https://www.openstack.org/summit/austin-2016/vote-for-speakers/#/18336
Thanks for your support.
-CS

by Cameron Seader (noreply@blogger.com) at February 21, 2017 11:44 PM

The Official Rackspace Blog

OpenStack and Virtualization: What’s the Difference?

If you’re confused about the differences between OpenStack and virtualization, you’re not alone. They are indeed different, and this post will describe how, review some practical ‘good fit’ use cases for OpenStack, and finally dispel a few myths about this growing open source cloud platform. To get started, a few basics: Virtualization has its roots in partitioning, which divides

The post OpenStack and Virtualization: What’s the Difference? appeared first on The Official Rackspace Blog.

by Walter Bentley at February 21, 2017 05:15 PM

OpenStack Superuser

Deploying Ironic in OpenStack Newton with TripleO

This process should work with any OpenStack Newton platform deployed with TripleO; an already deployed environment updated with the configuration templates described here should also work.

The workflow is based on the upstream documentation.

Architecture setup

With this setup, we can have virtual instances and instances on baremetal nodes in the same environment. In this architecture, I’m using floating IPs with VMs and a provisioning network with the baremetal nodes.

To be able to test this setup in a lab with virtual machines, we can use Libvirt+KVM using VMs for all the nodes in an all-in-one lab. The network topology is described in the diagram below.

Ideally we would have more networks, such as a dedicated network for cleaning the disks and another one for provisioning the baremetal nodes from the Overcloud with room for an extra one as the tenant network for the baremetal nodes in the Overcloud. For simplicity reasons though, I reused the Undercloud’s provisioning network in this lab for this four network roles:

  • Provisioning from the Undercloud
  • Provisioning from the Overcloud
  • Cleaning the baremetal nodes’ disks
  • Baremetal tenant network for the Overcloud nodes

OVS Libvirt VLANs

Virtual environment configuration

To test with root_device hints in the nodes (Libvirt VMs) that we want to test as baremetal nodes, we must define the first disk in Libvirt with a iSCSI bus and a World Wide Identifier:

<disk type='file' device='disk'>
 <driver name='qemu' type='qcow2'/>
 <source file='/var/lib/virtual-machines/overcloud-2-node4-disk1.qcow2'/>
 <target dev='sda' bus='scsi'/>
 <wwn>0x0000000000000001</wwn>
 </disk>

To verify the hints, we can optionally introspect the node in the Undercloud (as currently there’s no introspection in the Overcloud). This is what we can see after introspection of the node in the Undercloud:

$ openstack baremetal introspection data save 7740e442-96a6-496c-9bb2-7cac89b6a8e7|jq '.inventory.disks'
[
  {
    "size": 64424509440,
    "rotational": true,
    "vendor": "QEMU",
    "name": "/dev/sda",
    "wwn_vendor_extension": null,
    "wwn_with_extension": "0x0000000000000001",
    "model": "QEMU HARDDISK",
    "wwn": "0x0000000000000001",
    "serial": "0000000000000001"
  },
  {
    "size": 64424509440,
    "rotational": true,
    "vendor": "0x1af4",
    "name": "/dev/vda",
    "wwn_vendor_extension": null,
    "wwn_with_extension": null,
    "model": "",
    "wwn": null,
    "serial": null
  },
  {
    "size": 64424509440,
    "rotational": true,
    "vendor": "0x1af4",
    "name": "/dev/vdb",
    "wwn_vendor_extension": null,
    "wwn_with_extension": null,
    "model": "",
    "wwn": null,
    "serial": null
  },
  {
    "size": 64424509440,
    "rotational": true,
    "vendor": "0x1af4",
    "name": "/dev/vdc",
    "wwn_vendor_extension": null,
    "wwn_with_extension": null,
    "model": "",
    "wwn": null,
    "serial": null
  }
]

Undercloud templates

The following templates contain all the changes needed to configure Ironic and to adapt the NIC config to have a dedicated OVS bridge for Ironic as required.

Ironic configuration

~/templates/ironic.yaml

parameter_defaults:
    IronicEnabledDrivers:
        - pxe_ssh
    NovaSchedulerDefaultFilters:
        - RetryFilter
        - AggregateInstanceExtraSpecsFilter
        - AvailabilityZoneFilter
        - RamFilter
        - DiskFilter
        - ComputeFilter
        - ComputeCapabilitiesFilter
        - ImagePropertiesFilter
    IronicCleaningDiskErase: metadata
    IronicIPXEEnabled: true
    ControllerExtraConfig:
        ironic::drivers::ssh::libvirt_uri: 'qemu:///system'

Network configuration

First we map an extra bridge called br-baremetal which will be used by Ironic:

~/templates/network-environment.yaml:

[...]
parameter_defaults:
[...]
  NeutronBridgeMappings: datacentre:br-ex,baremetal:br-baremetal
  NeutronFlatNetworks: datacentre,baremetal 

This bridge will be configured in the provisioning network (control plane) of the controllers as we will reuse this network as the Ironic network later. If we wanted to add a dedicated network, we would do the same configuration.

It is important to mention that this Ironic network used for provisioning can’t be VLAN tagged, which is yet another reason to justify using the Undercloud’s provisioning network for this lab:

~/templates/nic-configs/controller.yaml:

[...]
          network_config:
            -
              type: ovs_bridge
              name: br-baremetal
              use_dhcp: false
              members:
                 -
                   type: interface
                   name: eth0
              addresses:
                -
                  ip_netmask:
                    list_join:
                      - '/'
                      - - {get_param: ControlPlaneIp}
                        - {get_param: ControlPlaneSubnetCidr}
              routes:
                -
                  ip_netmask: 169.254.169.254/32
                  next_hop: {get_param: EC2MetadataIp}
[...]

Deployment

This is the deployment script I’ve used. Note there’s a roles_data.yaml template to add a composable role (a new feature in OSP 10) that I used for the deployment of an Operational Tools server (Sensu and Fluentd). The deployment also includes three Ceph nodes. These are irrelevant for the purpose of this setup but I wanted to test it all together in an advanced and more realistic architecture.

Red Hat’s documentation contains the details for configuring these advanced options and the base configuration with the platform director.

~/deployment-scripts/ironic-ha-net-isol-deployment-dupa.sh:

openstack overcloud deploy \
--templates \
-r ~/templates/roles_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e ~/templates/network-environment.yaml \
-e ~/templates/ceph-storage.yaml \
-e ~/templates/parameters.yaml \
-e ~/templates/firstboot/firstboot.yaml \
-e ~/templates/ips-from-pool-all.yaml \
-e ~/templates/fluentd-client.yaml \
-e ~/templates/sensu-client.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml \
-e ~/templates/ironic.yaml \
--control-scale 3 \
--compute-scale 1 \
--ceph-storage-scale 3 \
--compute-flavor compute \
--control-flavor control \
--ceph-storage-flavor ceph-storage \
--timeout 60 \
--libvirt-type kvm

Post-deployment configuration

Verifications

After the deployment completes successfully, we should see how the controllers have the compute service enabled:

$ . overcloudrc
$ openstack compute service list -c Binary -c Host -c State
+------------------+------------------------------------+-------+
| Binary           | Host                               | State |
+------------------+------------------------------------+-------+
| nova-consoleauth | overcloud-controller-1.localdomain | up    |
| nova-scheduler   | overcloud-controller-1.localdomain | up    |
| nova-conductor   | overcloud-controller-1.localdomain | up    |
| nova-compute     | overcloud-controller-1.localdomain | up    |
| nova-consoleauth | overcloud-controller-0.localdomain | up    |
| nova-consoleauth | overcloud-controller-2.localdomain | up    |
| nova-scheduler   | overcloud-controller-0.localdomain | up    |
| nova-scheduler   | overcloud-controller-2.localdomain | up    |
| nova-conductor   | overcloud-controller-0.localdomain | up    |
| nova-conductor   | overcloud-controller-2.localdomain | up    |
| nova-compute     | overcloud-controller-0.localdomain | up    |
| nova-compute     | overcloud-controller-2.localdomain | up    |
| nova-compute     | overcloud-compute-0.localdomain    | up    |
+------------------+------------------------------------+-------+

And the driver we passed with IronicEnabledDrivers is also enabled:

$ openstack baremetal driver list
+---------------------+------------------------------------------------------------------------------------------------------------+
| Supported driver(s) | Active host(s)                                                                                             |
+---------------------+------------------------------------------------------------------------------------------------------------+
| pxe_ssh             | overcloud-controller-0.localdomain, overcloud-controller-1.localdomain, overcloud-controller-2.localdomain |
+---------------------+------------------------------------------------------------------------------------------------------------+

Bare metal network

This network will be:

  • The provisioning network for the Overcloud’s Ironic.
  • The cleaning network for wiping the baremetal node’s disks.
  • The tenant network for the Overcloud’s Ironic instances.

Create the baremetal network in the Overcloud with the same subnet and gateway as the Undercloud’s ctlplane using a different range:

$ . overcloudrc
$ openstack network create \
--share \
--provider-network-type flat \
--provider-physical-network baremetal \
--external \
baremetal
$ openstack subnet create \
--network baremetal \
--subnet-range 192.168.3.0/24 \
--gateway 192.168.3.1 \
--allocation-pool start=192.168.3.150,end=192.168.3.170 \
baremetal-subnet

Then, we need to configure each controller’s /etc/ironic.conf to use this network to clean the nodes’ disks at registration time and also before tenants use them as baremetal instances:

$ openstack network show baremetal -f value -c id
f7af39df-2576-4042-87c0-14c395ca19b4
$ ssh heat-admin@$CONTROLLER_IP
$ sudo vi /etc/ironic/ironic.conf
$ cleaning_network_uuid=f7af39df-2576-4042-87c0-14c395ca19b4
$ sudo systemctl restart openstack-ironic-conductor

We should also leave it ready to be included in our next update by adding it to the “ControllerExtraConfig” section in the ironic.yaml template:

parameter_defaults:
  ControllerExtraConfig:
    ironic::conductor::cleaning_network_uuid: f7af39df-2576-4042-87c0-14c395ca19b4

Bare metal deployment images

We can use the same deployment images we use in the Undercloud:

$ openstack image create --public --container-format aki --disk-format aki --file ~/images/ironic-python-agent.kernel deploy-kernel
$ openstack image create --public --container-format ari --disk-format ari --file ~/images/ironic-python-agent.initramfs deploy-ramdisk

We could also create them using the CoreOS images. For example, if we wanted to troubleshoot the deployment, we could use the CoreOS images and enable debug output in the Ironic Python Agent or adding our ssh-key to access during the deployment of the image.

Bare metal instance images

Again, for simplicity, we can use the overcloud-full image we use in the Undercloud:

$ KERNEL_ID=$(openstack image create --file ~/images/overcloud-full.vmlinuz --public --container-format aki --disk-format aki -f value -c id overcloud-full.vmlinuz)
$ RAMDISK_ID=$(openstack image create --file ~/images/overcloud-full.initrd --public --container-format ari --disk-format ari -f value -c id overcloud-full.initrd)
$ openstack image create --file ~/images/overcloud-full.qcow2 --public --container-format bare --disk-format qcow2 --property kernel_id=$KERNEL_ID --property ramdisk_id=$RAMDISK_ID overcloud-full

Note that it uses kernel and ramdisk images, as the Overcloud default image is a partition image.

Create flavors

We create two flavors to start with, one for the baremetal instances and another one for the virtual instances.

$ openstack flavor create --ram 1024 --disk 20 --vcpus 1 baremetal
$ openstack flavor create --disk 20 m1.small

Bare metal instances flavor

Then, we set a Boolean property in the newly created flavor called baremetal, which will also be set in the host aggregates (see below) to differentiate nodes for baremetal instances from nodes virtual instances.

And, as by default the boot_option is netboot, we set it to local (and later we will do the same when we create the baremetal node):

$ openstack flavor set baremetal --property baremetal=true
$ openstack flavor set baremetal --property capabilities:boot_option="local"

Virtual instances flavor

Lastly, we set the flavor for virtual instances with the boolean property set to false:

 $ openstack flavor set m1.small --property baremetal=false

Create host aggregates

To have OpenStack differentiating between baremetal and virtual instances we can create host aggregates to have the nova-compute service running on the controllers just for Ironic and the the one on compute nodes for virtual instances:

$ openstack aggregate create --property baremetal=true baremetal-hosts
$ openstack aggregate create --property baremetal=false virtual-hosts
$ for compute in $(openstack hypervisor list -f value -c "Hypervisor Hostname" | grep compute); do openstack aggregate add host virtual-hosts $compute; done
$ openstack aggregate add host baremetal-hosts overcloud-controller-0.localdomain
$ openstack aggregate add host baremetal-hosts overcloud-controller-1.localdomain
$ openstack aggregate add host baremetal-hosts overcloud-controller-2.localdomain

Register the nodes in Ironic

The nodes can be registered with the command, openstack baremetal create, and a YAML template where the node is defined. In this example, I register only one node (overcloud-2-node4), which I had previously registered in the Undercloud for introspection (and later deleted from it or set to “maintenance” mode to avoid conflicts between the two Ironic services).

The root_device section contains commented examples of the hints we could use. Remember that while configuring the Libvirt XML file for the node above, we added a wwn ID section, which is the one we’ll use in this example.

This template is like the instackenv.json one in the Undercloud, but in YAML.

$ cat overcloud-2-node4.yaml
nodes:
    - name: overcloud-2-node5
      driver: pxe_ssh
      driver_info:
        ssh_username: stack
        ssh_key_contents:  |
          -----BEGIN RSA PRIVATE KEY-----
          MIIEogIBAAKCAQEAxc0a2u18EgTy5y9JvaExDXP2pWuE8Ebyo24AOo1iQoWR7D5n
          fNjkgCeKZRbABhsdoMBmbDMtn0PO3lzI2HnZQBB4BdBZprAiQ1NwKKotUv9puTeY
          [..]
          7DsSKAL4EDqjufY3h+4fRwOcD+EFqlUTDG1sjsSDKjdiHyYMzjcrg8nbaj/M9kAs
          xXnSm9686KxUiCDXO5FWKun204B18mPH1UP20aYw098t6aAQwm4=
          -----END RSA PRIVATE KEY-----
        ssh_virt_type: virsh
        ssh_address: 10.0.0.1
      properties:
        cpus: 4
        memory_mb: 12288
        local_gb: 60
        #boot_option: local (it doesn't set 'capabilities')
        root_device:
          # vendor: "0x1af4"
          # model: "QEMU HARDDISK"
          # size: 64424509440
          wwn: "0x0000000000000001"
          # serial: "0000000000000001"
          # vendor: QEMU
          # name: /dev/sda
      ports:
        - address: 52:54:00:a0:af:da

We create the node using the above template:

$ openstack baremetal create overcloud-2-node4.yaml

Then we have to specify which are the deployment kernel and ramdisk for the node:

$ DEPLOY_KERNEL=$(openstack image show deploy-kernel -f value -c id)
$ DEPLOY_RAMDISK=$(openstack image show deploy-ramdisk -f value -c id)
$ openstack baremetal node set $(openstack baremetal node show overcloud-2-node4 -f value -c uuid) \
--driver-info deploy_kernel=$DEPLOY_KERNEL \
--driver-info deploy_ramdisk=$DEPLOY_RAMDISK

And lastly, just like we do in the Undercloud, we set the node to available:

$ openstack baremetal node manage $(openstack baremetal node show overcloud-2-node4 -f value -c uuid)
$ openstack baremetal node provide $(openstack baremetal node show overcloud-2-node4 -f value -c uuid)

You can have all of this in a script and run it together every time you register a node.

If everything has gone well, the node will be registered and Ironic will clean its disk metadata (as per above configuration):

$ openstack baremetal node list -c Name -c "Power State" -c "Provisioning State"
+-------------------+-------------+--------------------+
| Name              | Power State | Provisioning State |
+-------------------+-------------+--------------------+
| overcloud-2-node4 | power off   | cleaning           |
+-------------------+-------------+--------------------+

Wait until the cleaning process has finished and then set the boot_option to local:

$ openstack baremetal node set $(openstack baremetal node show overcloud-2-node4 -f value -c uuid) --property 'capabilities=boot_option:local'

Start a bare metal instance

Just as in the virtual instances we’ll use a ssh key and then we’ll start the instance with Ironic:

$ openstack keypair create --public-key ~/.ssh/id_rsa.pub stack-key

Then we make sure that the cleaning process has finished (“Provisioning State” is available):

$ openstack baremetal node list -c Name -c "Power State" -c "Provisioning State"
+-------------------+-------------+--------------------+
| Name              | Power State | Provisioning State |
+-------------------+-------------+--------------------+
| overcloud-2-node4 | power off   | available          |
+-------------------+-------------+--------------------+

And we start the baremetal instance:

$ openstack server create \
--image overcloud-full \
--flavor baremetal \
--nic net-id=$(openstack network show baremetal -f value -c id) \

Now check its IP and access the newly created machine:

$ openstack server list -c Name -c Status -c Networks
+---------------+--------+-------------------------+
| Name          | Status | Networks                |
+---------------+--------+-------------------------+
| bm-instance-0 | ACTIVE | baremetal=192.168.3.157 |
+---------------+--------+-------------------------+
$ ssh cloud-user@192.168.3.157
Warning: Permanently added '192.168.3.157' (ECDSA) to the list of known hosts.
Last login: Sun Jan 15 07:49:37 2017 from gateway
[cloud-user@bm-instance-0 ~]$

Start a virtual instance

Optionally, we can start a virtual instance to test whether virtual and baremetal instances are able to reach each other.

As I need to create public and private networks, an image, a router, a security group, a floating IP, etc., I’ll use a Heat template that does it all for me and, including creating the virtual instance, so I will use it skip the details of doing this:

$ openstack stack create -e overcloud-env.yaml -t overcloud-template.yaml overcloud-stack

Check that the networks and the instance have been created:

$ openstack network list -c Name
+----------------------------------------------------+
| Name                                               |
+----------------------------------------------------+
| public                                             |
| baremetal                                          |
| HA network tenant 1e6a7de837ad488d8beed626c86a6dfe |
| private-net                                        |
+----------------------------------------------------+
$ openstack server list -c Name -c Networks
+----------------------------------------+------------------------------------+
| Name                                   | Networks                           |
+----------------------------------------+------------------------------------+
| overcloud-stack-instance0-2thafsncdgli | private-net=172.16.2.6, 10.0.0.168 |
| bm-instance-0                          | baremetal=192.168.3.157            |
+----------------------------------------+------------------------------------+

We now have both instances and they can communicate over the network:

$ ssh cirros@10.0.0.168
Warning: Permanently added '10.0.0.168' (RSA) to the list of known hosts.
$ ping 192.168.3.157
PING 192.168.3.157 (192.168.3.157): 56 data bytes
64 bytes from 192.168.3.157: seq=0 ttl=62 time=1.573 ms
64 bytes from 192.168.3.157: seq=1 ttl=62 time=0.914 ms
64 bytes from 192.168.3.157: seq=2 ttl=62 time=1.064 ms
^C
--- 192.168.3.157 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.914/1.183/1.573 ms

This post first appeared on the Tricky Cloud blog. Superuser is always interested in community content, email: editor@superuser.org.

Cover Photo // CC BY NC

The post Deploying Ironic in OpenStack Newton with TripleO appeared first on OpenStack Superuser.

by Ramon Acedo at February 21, 2017 03:35 PM

StackHPC Team Blog

StackHPC at FOSDEM/PGDay 2017

PG Day covered all things Postgres at FOSDEM 2017, and Steve Simpson, one of StackHPC's senior technical leads, presented at PG Day on his thoughts for how some of the advanced features of Postgres could really shine as a backing store for telemetry, logging and monitoring.

As Steve describes in his interview for FOSDEM PG Day, he understands Postgres from the intimate vantage point of having worked with the code base, and gained respect for its implementation under the hood in addition to its capabilities as an RDBMS.

Through exploiting the unique strengths of Postgres, Steve sees an opportunity to both simplify and enhance OpenStack monitoring in one move. He'll be elaborating on his proposed designs and the progress of this project in a StackHPC blog post in due course.

Steve's talk was recorded and slides are available on slideshare.

by Stig Telfer at February 21, 2017 12:40 PM

February 20, 2017

Rob Hirschfeld

Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

<iframe allowfullscreen="true" class="youtube-player" height="359" src="https://www.youtube.com/embed/6xuVm9PJ2ck?version=3&amp;rel=1&amp;fs=1&amp;autohide=2&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent" style="border:0;" type="text/html" width="584"></iframe>

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.


by Rob H at February 20, 2017 06:53 PM

OpenStack Superuser

Fighting for usability in the open source community

Open source is growing and so are the different uses of it. Industries that, at first glance, seem worlds apart from each other have found common ground in the open source technologies they use to improve their businesses. Consequently, maintaining a quality user experience across the board has become increasingly difficult for open- source developers.

“It’s definitely getting more difficult to address all the use cases that are being presented,” says David Lyle, OpenStack Horizon project team lead.

Fortunately, the open source community as a whole is fighting for the users and addressing usability issues to help foster broader adoption.

“One of the misconceptions about engineers is that they don’t care about their users,” says Piet Kruithof, OpenStack User Experience (UX) project team lead. “Within the community, we found that, in fact, that is not the case; engineers care passionately about their users.”

Open-source developers have demonstrated their commitment to increasing usability by putting together teams that address common complaints across their users. OpenStack, for example, launched their UX Project to support and facilitate cross-project efforts to improve the overall user experience of OpenStack.

However, one of the primary issues open source developers have had to address is consistency and, as open source continues to grow, the goal of maintaining consistency within their respective technologies will continue to drive developers.

“You can get into a Ford and know how to drive it, then get into a Mercedes and still know how to drive it,” says Dean Troyer, OpenStackClient project team lead. “There are certain things that just don’t change, and that’s the kind of consistency that we’re talking about.”

To learn more about usability in the open-source community, check out the follow video, produced by Intel:

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="360" src="https://www.youtube.com/embed/6VhkAHf6Zrs?feature=oembed" width="640"></iframe>

Cover Photo // CC BY NC

The post Fighting for usability in the open source community appeared first on OpenStack Superuser.

by Isaac Gonzalez at February 20, 2017 02:16 PM

Hugh Blemings

Lwood-20170219

Introduction

Welcome to Last week on OpenStack Dev (“Lwood”) for the week just past. For more background on Lwood, please refer here.

Basic Stats for the week 13 to 19 February for openstack-dev:

~575 Messages (three messages more than the long term average)

~212 Unique threads (up about 18% relative to the long term average)

Traffic picked up a fair bit this week – almost exactly on the long term average for messages.  Threads up a bit more – lots of short threads, a mixture of those about project logos and PTG logistics contributing there I think.

 

Notable Discussions – openstack-dev

Proposed Pike release schedule

Thierry Carrez posted to the list with some information on the proposed Pike release schedule.  The human friendly version is here.  Week zero – release week – is the week of August 28

Assistance sought for the Outreachy program

From Mahati Chamarthy an update about the Outreachy program – an initiative that helps folk from underrepresented groups get involved in FOSS. It’s a worthy initiative if there were ever one, a lot of support was shown from it at linux.conf.au recently as it happens too.

Please consider getting involved and/or supporting the programs work financially.

Session voting open for OpenStack Summit Boston

Erin Disney writes to advise that voting is open for sessions in Boston until 7:59am Wednesday 22nd February (UTC)  She notes that unique URLs for submissions have been returned based on community feedback.

Final Team Mascots

A slew of messages this week announcing the final versions of the team mascots that the OpenStack Foundation has been coordinating.  I briefly contemplated listing them all here but that seemed a sub-optimal way to spend the next hour – so if you want to find one for your favourite project, follow this link and use your browser search for “mascot” or “logo” – mostly the former. The Foundation will, I gather, be publishing a canonical list of them all shortly in any case.

In a thread about licensing for the images kicked off by Graham Hayes was the clarification that they’ll be CC-BY-ND

End of Week Wrap-ups, Summaries and Updates

People and Projects

Project Team Lead Election Conclusion and Results

Kendall Nelson summarises the results of the recent PTL elections in a post to the list.  Most Projects had the one PTL nominee, those that went to election were Ironic, Keystone, Neutron, QA and Stable Branch Maintenance.  Full details in Kendall’s message.

Core nominations & changes

Miscellanea

Further reading

Don’t forget these excellent sources of OpenStack news – most recent ones linked in each case

Credits

This weeks edition of Lwood brought to you by Daft Punk (Random Access Memories) and DeeExpus (King of Number 33)

 

by hugh at February 20, 2017 07:30 AM

Opensource.com

Boston summit preview, Ambassador program updates, and more OpenStack news

Are you interested in keeping track of what is happening in the open source cloud? Opensource.com is your source for news in OpenStack, the open source cloud infrastructure project.


OpenStack around the web

From news sites to developer blogs, there's a lot being written about OpenStack every week. Here are a few highlights.

by Jason Baker at February 20, 2017 06:00 AM

February 19, 2017

Cloudwatt

Functionnal SDN testing

Introduction

SDN products are evolving fast. The release cycles can be short and more and more features are added in each cycle. This is clearly a change that network administrators weren’t used to with hardware solutions.

In this context the operational team in charge of the SDN functionnality of the platform must be confident when deploying new releases. For that matter the team must be able to test new builds for ISO functionality with the previous build, and detect possible regressions.

Of course SDN vendors are already running end to end tests on their releases but does they test your use cases ? And what if your are building yourself the SDN software ? You cannot be sure that your build passes the vendor tests. A good idea would be to integrate the vendor functionnal tests in your CI platform but it is not always possible. The tests are maybe not distributed or even runnable outside the vendor infrastructure.

At Cloudwatt we are building our version of OpenContrail, meaning OpenContrail upstream branch with backports and sometimes non upstreamed patches that are specific to our platform. As for OpenContrail functionnal tests there is repository avaiable at https://github.com/Juniper/contrail-test-ci. We tried to run these tests in our CI but it quickly becomes a nightmare. The tests are open but clearly not suitable to be run on a generic CI platform. In the end we decided to write our own functionnal tests.

Objectives

The tests we want to run can be sumarized in 3 steps

  1. Deploy an infrastructure with multiple VMs, VNs, SGs, etc…
  2. Generate some traffic with classic network debugging tools (ping, netcat, netperf, scappy…)
  3. Validate that the traffic is going to the right place and has the right shape

As for the global objectives of the tests we want to be SDN agnostic. We also want to avoid any large customization of the VMs images like setting up agents and be able to control them. Ideally we should be able to integrate a complex customer stack and test it with minor modifications. Finally the orchestration must be simple as possible.

Our solution

Instead of reinventing the wheel and to make tests as KISS as possible we are using two powerful tools:

  1. Terraform[1] which is used to deploy the infrastructure for the test and can also modify it during the test
  2. Skydive[2] which is used to validate the traffic

In our tests we are not checking internals of the SDN solution, OpenContrail in our case. We’d like to keep the tests backend agnostic and in the end if the test passes we can assume that the backend is behaving correctly. Because of that we don’t need a complex setup to run the tests so they can be run simply from a laptop. Basically you need terraform and skydive being deployed on the platform. The tools are easy to deploy or install.

Terraform is quite well known. It provides a DSL to describe the infrastructure you wish to deploy on a cloud provider. In our case we are using the Openstack provider but Terraform can handle other providers as well (AWS, Azure…). The tool is quite comparable to the Heat component in the Openstack world. The advantage of Terraform over Heat is that you can do incremental updates to your infrastructure.

Skydive on the other end is quite new and not widely used yet. The project aims to provide a tool to debug and troubleshoot network infrastructures and especially SDN platforms. It provides a representation of the network topology (interfaces, links between them) and traffic capture on demand via REST apis.

In our tests we are using the ondemand capture feature to validate the traffic in the infrastructure.

The “hello world” test

So, how a test would look like with this solution ? For example, let’s have a look on a simple security group test.

The goal of the test is to validate that 2 VMs can talk to each other because the SG allows it, then after removing some rule of the SG we validate that the traffic is dropped.

Terraform stack

First we need to describe the infrastructure to setup with Terraform.

<script src="https://gist.github.com/eonpatapon/8d10234ef8222b3d1f3b17e240488230.js"></script>

We are booting 2 VMs (sg_vm1, sg_vm2). They are spawned in the same VN (sg_net) and both use the same security group (sg_secgroup) which allows ICMP and SSH traffic.

By using nova cloudinit API a script will be run on sg_vm1 that will run a ping to sg_vm2 as soon as the VM is booted.

The test ifself

Next we will write a small shell script to run a sequence of tasks. If one task fails the whole test should fail.

The tasks to run in order would be:

  1. apply the terraform stack on the target environment
  2. start a traffic capture on the sg_vm1 port
  3. poll skydive until we see some ICMP traffic going out and coming back on the interface
  4. remove the ICMP rule of the security-group using terraform
  5. check with skydive that the ICMP traffic is going out the interface but that nothing is coming back
  6. destroy the infrastructure and the skydive capture

This is the full script with comments:

<script src="https://gist.github.com/eonpatapon/d35b922af6f563407aa8c83beab6c2a8.js"></script>

Doing this with bash isn’t probably the best option but it shows that with only a few lines we are able to have a end to end test. There is also no need for synchronization and no need to contact the VMs directly which makes things simpler. The VM gets its configuration and commands to run from the Nova metadata service then only requests to Skydive are made to ensure the traffic is behaving as it should.

Result of the script:

<script src="https://gist.github.com/eonpatapon/b0ea48a86e6f66361199001090ef05e4.js"></script>

Conclusion

Relying on powerful tools makes our lives easier and so our tests. Instead of developing a complete test framework in-house to do the same we rely on tools that have good community support.

The glue between theses tools is so simple that you could rewrite the last test with some test framework in a day.

Finally, investing time on these tools is interesting because they are not just useful for tests but in a lot of other use-cases, such as debugging production environmnents when some bugs passed trough the test CI!

  • [1] https://www.terraform.io/
  • [2] http://skydive-project.github.io/skydive/

by Jean Philippe Braun at February 19, 2017 11:00 PM

Thierry Carrez

Using proprietary services to develop open source software

It is now pretty well accepted that open source is a superior way of producing software. Almost everyone is doing open source those days. In particular, the ability for users to look under the hood and make changes results in tools that are better adapted to their workflows. It reduces the cost and risk of finding yourself locked-in with a vendor in an unbalanced relationship. It contributes to a virtuous circle of continuous improvement, blurring the lines between consumers and producers. It enables everyone to remix and invent new things. It adds up to the common human knowledge.

And yet

And yet, a lot of open source software is developed on (and with the help of) proprietary services running closed-source code. Countless open source projects are developed on GitHub, or with the help of Jira for bugtracking, Slack for communications, Google docs for document authoring and sharing, Trello for status boards. That sounds a bit paradoxical and hypocritical -- a bit too much "do what I say, not what I do". Why is that ? If we agree that open source has so many tangible benefits, why are we so willing to forfeit them with the very tooling we use to produce it ?

But it's free !

The argument usually goes like this: those platforms may be proprietary, they offer great features, and they are provided free of charge to my open source project. Why on Earth would I go through the hassle of setting up, maintaining, and paying for infrastructure to run less featureful solutions ? Or why would I pay for someone to host it for me ? The trick is, as the saying goes, when the product is free, you are the product. In this case, your open source community is the product. In the worst case scenario, the personal data and activity patterns of your community members will be sold to 3rd parties. In the best case scenario, your open source community is recruited by force in an army that furthers the network effect and makes it even more difficult for the next open source project to not use that proprietary service. In all cases, you, as a project, decide to not bear the direct cost, but ask each and every one of your contributors to pay for it indirectly instead. You force all of your contributors to accept the ever-changing terms of use of the proprietary service in order to participate to your "open" community.

Recognizing the trade-off

It is important to recognize the situation for what it is. A trade-off. On one side, shiny features, convenience. On the other, a lock-in of your community through specific features, data formats, proprietary protocols or just plain old network effect and habit. Each situation is different. In some cases the gap between the proprietary service and the open platform will be so large that it makes sense to bear the cost. Google Docs is pretty good at what it does, and I find myself using it when collaborating on something more complex than etherpads or ethercalcs. At the opposite end of the spectrum, there is really no reason to use Doodle when you can use Framadate. In the same vein, Wekan is close enough to Trello that you should really consider it as well. For Slack vs. Mattermost vs. IRC, the trade-off is more subtle. As a sidenote, the cost of lock-in is a lot reduced when the proprietary service is built on standard protocols. For example, GMail is not that much of a problem because it is easy enough to use IMAP to integrate it (and possibly move away from it in the future). If Slack was just a stellar opinionated client using IRC protocols and servers, it would also not be that much of a problem.

Part of the solution

Any simple answer to this trade-off would be dogmatic. You are not unpure if you use proprietary services, and you are not wearing blinders if you use open source software for your project infrastructure. Each community will answer that trade-off differently, based on their roots and history. The important part is to acknowledge that nothing is free. When the choice is made, we all need to be mindful of what we gain, and what we lose. To conclude, I think we can all agree that all other things being equal, when there is an open-source solution which has all the features of the proprietary offering, we all prefer to use that. The corollary is, we all benefit when those open-source solutions get better. So to be part of the solution, consider helping those open source projects build something as good as the proprietary alternative, especially when they are pretty close to it feature-wise. That will make solving that trade-off a lot easier.

by Thierry Carrez at February 19, 2017 01:00 PM

February 18, 2017

Clint Byrum

Free and Open Source Leaders -- You need a President

Recently I was lucky enough to be invited to attend the Linux Foundation Open Source Leadership Summit. The event was stacked with many of the people I consider mentors, friends, and definitely leaders in the various Open Source and Free Software communities that I participate in.

I was able to observe the CNCF Technical Oversight Committee meeting while there, and was impressed at the way they worked toward consensus where possible. It reminded me of the OpenStack Technical Committee in its make up of well spoken technical individuals who care about their users and stand up for the technical excellence of their foundations’ activities.

But it struck me (and several other attendees) that this consensus building has limitations. Adam Jacob noted that Linus Torvalds had given an interview on stage earlier in the day where he noted that most of his role was to listen closely for a time to differing opinions, but then stop them when it was clear there was no consensus, and select one that he felt was technically excellent, and move on. Linus, being the founder of Linux and the benevolent dictator of the project for its lifetime thus far, has earned this moral authority.

However, unlike Linux, many of the modern foundation-fostered projects lack an executive branch. The structure we see for governance is centered around ensuring corporations that want to sponsor and rely on development have influence. Foundation members pay dues to get various levels of board seats or corporate access to events and data. And this is a good thing, as it keeps people like me paid to work in these communities.

However, I believe as technical contributors, we sometimes give this too much sway in the actual governance of the community and the projects. These foundation boards know that day to day decision making should be left to those working in the project, and as such allow committees like the CNCF TOC or the OpenStack TC full agency over the technical aspects of the member projects.

I believe these committees operate as a legislative branch. They evaluate conditions and regulate the projects accordingly, allocating budgets for infrastructure and passing edicts to avoid chaos. Since they’re not as large as political legislative bodies like the US House of Representatives & Senate, they can usually operate on a consensus basis, and not drive everything to a contentious vote. By and large, these are as nimble as a legislative body can be.

However, I believe we need an executive to be effective. At some point, we need a single person to listen to the facts, entertain theories, and then decide, and execute a plan. Some projects have natural single leaders like this. Most, however, do not.

I believe we as engineers aren’t generally good at being like Linus. If you’ve spent any time in the corporate world you’ve had an executive disagree with you and run you right over. When we get the chance to distribute power evenly, we do it.

But I think that’s a mistake. I think we should strive to have executives. Not just organizers like the OpenStack PTL, but more like the Debian Project Leader. Empowered people with the responsibility to serve as a visionary and keep the project’s decision making relevant and of high quality. This would also give the board somebody to interact with directly so that they do not have to try and convince the whole community to move in a particular direction to wield influence. In this way, I believe we’d end up with a system of checks and balances similar to the US Constitution

Checks and Balances

So here is my suggestion for how a project executive structure could work, assuming there is already a strong technical committee and a well defined voting electorate that I call the “active technical contributors”.

  1. The president is elected by Condorcet vote of the active technical contributors of a project for a term of 1 year.

  2. The president will have veto power over any proposed change to the project’s technical assets.

  3. The technical committee may override the president’s veto by a super majority vote.

  4. The president will inform the technical contributors of their plans for the project every 6 months.

This system only works if the project contributors expect their project president to actively drive the vision of the project. Basically, the culture has to turn to this executive for final decision making before it comes to a veto. The veto is for times when the community makes poor decisions. And this doesn’t replace leaders of individual teams. Think of these like the governors of states in the US. They’re running their sub-project inside the parameters set down by the technical committee and the president.

And in the case of foundations or communities with boards, I believe ultimately a board would serve as the judicial branch, checking the legality of changes made against the by-laws of the group. If there’s no board of sorts, a judiciary could be appointed and confirmed, similar to the US supreme court or the Debian CTTE. This would also just be necessary to ensure that the technical arm of a project doesn’t get the foundation into legal trouble of any kind, which is already what foundation boards tend to do.

I’d love to hear your thoughts on this on Twitter, please tweet me @SpamapS with the hashtag #OpenSourcePresident to get the discussion going.

February 18, 2017 12:00 AM

February 17, 2017

RDO

OpenStack Project Team Gathering, Atlanta, 2017

<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/oOKnJaJI7j8?list=PL27cQhFqK1QzaZL1XrX_CzT7uCOWQ64xM" width="560"></iframe>

Over the last several years, OpenStack has conducted OpenStack Summit twice a year. One of these occurs in North America, and the other one alternates between Europe and Asia/Pacific.

This year, OpenStack Summit in North America is in Boston , and the other one will be in Sydney.

This year, though, the OpenStack Foundation is trying something a little different. Wheras in previous years, a portion of OpenStack Summit was the developers summit, where the next version of OpenStack was planned, this year that's been split off into its own separate event called the PTG - the Project Teams Gathering. That's going to be happening next week in Atlanta.

Throughout the week, I'm going to be interviewing engineers who work on OpenStack. Most of these will be people from Red Hat, but I will also be interviewing people from some other organizations, and posting their thoughts about the Ocata release - what they've been working on, and what they'll be working on in the upcoming Pike release, based on their conversations in the coming week at the PTG.

So, follow this channel over the next couple weeks as I start posting those interviews. It's going to take me a while to edit them after next week, of course. But you'll start seeing some of these appear in my YouTube channel over the coming few days.

Thanks, and I look forward to filling you in on what's happening in upstream OpenStack.

by Rich Bowen at February 17, 2017 07:56 PM

Rob Hirschfeld

“Why SRE?” Discussion with Eric @Discoposse Wright

sre-series My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

ericewrightI was a guest on Eric “@discoposse” Wright of the Green Circle Community #42 Podcast (my previous appearance).

LISTEN NOW: Podcast #42

In this action-packed 30 minute conversation, we discuss the industry forces putting pressure on operations teams.  These pressures require operators to be investing much more heavily on reusable automation.

That leads us towards why Kubernetes is interesting and what went wrong with OpenStack (I actually use the phrase “dumpster fire”).  We ultimately talk about how those lessons embedded in Digital Rebar architecture.


by Rob H at February 17, 2017 05:40 PM

OpenStack Superuser

Containers on the CERN Cloud

We have recently made the Container-Engine-as-a-Service (Magnum) available in production at CERN as part of the CERN IT department services for the LHC experiments and other CERN communities. This gives the OpenStack cloud users Kubernetes, Mesos and Docker Swarm on demand within the accounting, quota and project permissions structures already implemented for virtual machines.

We shared the latest news on the service with the CERN technical staff (link). This is the follow up on the tests presented at the OpenStack Barcelona (link) and covered in the blog from IBM. The work has been helped by collaborations with Rackspace in the framework of the CERN openlab and the European Union Horizon 2020 Indigo Datacloud project.

Performance

At the Barcelona summit, we presented with Rackspace and IBM regarding our additional performance tests after the previous blog post. We expanded beyond the 2M requests/s to reach around 7M where some network infrastructure issues unrelated to OpenStack limited the scaling further.
As we created the clusters, the deployment time increased only slightly with the number of nodes as most of the work is done in parallel. But for 128 node or larger clusters, the increase in time started to scale almost linearly. At the Barcelona summit, the Heat and Magnum teams worked together to develop proposals for how to improve further in future releases, although a 1000 node cluster in 23 minutes is still a good result
Cluster Size (Nodes)
Concurrency
Deployment Time (min)
2
50
2.5
16
10
4
32
10
4
128
5
5.5
512
1
14
1000
1
23

Storage

With the LHC producing nearly 50PB this year, High Energy Physics has some custom storage technologies for specific purposes, EOS for physics data, CVMFS for read-only, highly replicated storage such as applications.

One of the features of providing a private cloud service to the CERN users is to combine the functionality of open source community software such as OpenStack with the specific needs for high energy physics. For these to work, some careful driver work is needed to ensure appropriate access while ensuring user rights. In particular,

  • EOS provides a disk-based storage system providing high-capacity and low-latency access for users at CERN. Typical use cases are where scientists are analysing data from the experiments.
  • CVMFS is used for a scalable, reliable and low-maintenance for read-only data such as software.
There are also other storage solutions we use at CERN such as
  • HDFS for long term archiving of data using Hadoop which uses an HDFS driver within the container.  HDFS works in user space, so no particular integration was required to use it from inside (unprivileged) containers
  • Cinder provides additional disk space using volumes if the basic flavor does not have sufficient. This Cinder integration is offered by upstream Magnum, and work was done in the last OpenStack cycle to improve security by adding support for Keystone trusts.
CVMFS was more straightforward as there is no need to authenticate the user. The data is read-only and can be exposed to any container. The access to the file system is provided using a driver (link) which has been adapted to run inside a container. This saves having to run additional software inside the VM hosting the container.
EOS requires authentication through mechanisms such as Kerberos to identify the user and thus determine what files they have access to. Here a container is run per user so that there is no risk of credential sharing. The details are in the driver (link).

Service model

One interesting question that came up during the discussions of the container service was how to deliver the service to the end users. There are several scenarios:
  1. The end user launches a container engine with their specifications but they rely on the IT department to maintain the engine availability. This implies that the VMs running the container engine are not accessible to the end user.
  2. The end user launches the engine within a project that they administer. While the IT department maintains the templates and basic functions such as the Fedora Atomic images, the end user is in control of the upgrades and availability.
  3. A variation of option 2., where the nodes running containers are reachable and managed by the end user, but the container engine master nodes are managed by the IT department. This is similar to the current offer from the Google Container Engine and requires some coordination and policies regarding upgrades
Currently, the default Magnum model is for the 2nd option and adding option 3 is something we could do in the near future. As users become more interested in consuming containers, we may investigate the 1st option further

Applications

Many applications at use in CERN are in the process of being reworked for a microservices based architecture. A choice of different container engines is attractive for the software developer. One example of this is the file transfer service which ensures that the network to other high energy physics sites is kept busy but not overloaded with data transfers. The work to containerise this application was described at the recent CHEP 2016 FTS poster.
While deploying containers is an area of great interest for the software community, the key value comes from the physics applications exploiting containers to deliver a new way of working. The Swan project provides a tool for running ROOT, the High Energy Physics application framework, in a browser with easy access to the storage outlined above. A set of examples can be found at https://swan.web.cern.ch/notebook-galleries. With the academic paper, the programs used and the data available from the notebook, this allows easy sharing with other physicists during the review process using CERNBox, CERN’s owncloud based file sharing solution.
Another application being studied is http://opendata.cern.ch/?ln=en which allows the general public to run analyses on LHC open data. Typical applications are Citizen Science and outreach for schools.

Ongoing work

There are a few major items where we are working with the upstream community:
  • Cluster upgrades will allow us to upgrade the container software. Examples of this would be a new version of Fedora Atomic, Docker or the container engine. With a load balancer, this can be performed without downtime (spec)
  • Heterogeneous cluster support will allow nodes to have different flavors (cpu vs gpu, different i/o patterns, different AZs for improved failure scenarios). This is done by splitting the cluster nodes into node groups (blueprint)
  • Cluster monitoring to deploy Prometheus and cAdvisor with Grafana dashboards for easy monitoring of a Magnum cluster (blueprint).

References

This post first appeared on the OpenStack in Production blog. Superuser is always interested in community content, email: editor@superuser.org.

Cover Photo // CC BY NC

The post Containers on the CERN Cloud appeared first on OpenStack Superuser.

by Tim Bell at February 17, 2017 12:21 PM

OpenStack Blog

User Group Newsletter February 2017


Welcome to 2017! We hope you all had a lovely festive season. Here is our first edition of the User Group newsletter for this year.

AMBASSADOR PROGRAM NEWS

2017 sees some new arrivals and departures to our Ambassador program. Read about them here.

 

WELCOME TO OUR NEW USER GROUPS

We have some new user groups which have joined the OpenStack community.

Bangladesh

Ireland – Cork

Russia – St Petersburg

United States – Phoenix, Arizona

Romania – Bucharest

We wish them all the best with their OpenStack journey and can’t wait to see what they will achieve!

Looking for a your local group? Are you thinking of starting a user group? Head to the groups portal for more information.


MAY 2017 OPENSTACK SUMMIT

We’re going to Boston for our first summit of 2017!!

You can register and stay updated here.

Consider it your pocket guide for all things Boston summit. Find out about the featured speakers, make your hotel bookings, find your FAQ and read about our travel support program.

 

NEW BOARD OF DIRECTORS
The community has spoken! A new board of directors has been elected for 2017.
Read all about it here. 


MAKE YOUR VOICE HEARD!
Submit your response the latest OpenStack User Survey!
All data is completely confidential. Submissions close on the 20th of February 2017.
You can complete it here. 

CONTRIBUTING TO UG NEWSLETTER

If you’d like to contribute a news item for next edition, please submit to this etherpad.
Items submitted may be edited down for length, style and suitability.
This newsletter is published on a monthly basis. 

by Sonia Ramza at February 17, 2017 04:47 AM

hastexo

Importing an existing Ceph RBD image into Glance

The normal process of uploading an image into Glance is straightforward: you use glance image-create or openstack image create, or the Horizon dashboard. Whichever process you choose, you select a local file, which you upload into the Glance image store.

This process can be unpleasantly time-consuming when your Glance service is backed with Ceph RBD, for a practical reason. When using the rbd image store, you're expected to use raw images, which have interesting characteristics.

Raw images and sparse files

Most people will take an existing vendor cloud image, which is typically available in the qcow2 format, and convert it using the qemu-img utility, like so:

$ wget -O ubuntu-xenial.qcow2 \
  https://cloud-images.ubuntu.com/xenial/current/xenial-server-cloudimg-amd64-disk1.img
$ qemu-img convert -p -f qcow2 -O raw ubuntu-xenial.qcow2 ubuntu-xenial.raw

On face value, the result looks innocuous enough:

$ qemu-img info ubuntu-xenial.qcow2 
image: ubuntu-xenial.qcow2
file format: qcow2
virtual size: 2.2G (2361393152 bytes)
disk size: 308M
cluster_size: 65536
Format specific information:
    compat: 0.10
    refcount bits: 16

$ qemu-img info ubuntu-xenial.raw
image: ubuntu-xenial.raw
file format: raw
virtual size: 2.2G (2361393152 bytes)
disk size: 1000M

As you can see, in both cases the virtual image size differs starkly from the actual file size. In qcow2, this is due to the copy-on-write nature of the file format and zlib compression; for the raw image, we're dealing with a sparse file:

$ ls -lh ubuntu-xenial.qcow2
-rw-rw-r-- 1 florian florian 308M Feb 17 10:05 ubuntu-xenial.qcow2
$ du -h  ubuntu-xenial.qcow2
308M    ubuntu-xenial.qcow2
$ ls -lh info ubuntu-xenial.raw
-rw-r--r-- 1 florian florian 2.2G Feb 17 10:16 ubuntu-xenial.raw
$ du -h  ubuntu-xenial.raw
1000M   ubuntu-xenial.raw

So, while the qcow2 file's physical and logical sizes match, the raw file looks much larger in terms of filesystem metadata, as opposed to its actual storage utilization. That's because in a sparse file, "holes" (essentially, sequences of null bytes) aren't actually written to the filesystem. Instead, the filesystems just records the position and length of each "hole", and when we read from the "holes" in the file, the read would just return null bytes again.

The trouble with sparse files is that RESTful web services, like Glance, don't know too much about them. So, if we were to import that raw file with openstack image-create --file my_cloud_image.raw, the command line client would upload null bytes with happy abandon, which would greatly lengthen the process.

Importing images into RBD with qemu-img convert

Luckily for us, qemu-img also allows us to upload directly into RBD. All you need to do is make sure the image goes into the correct pool, and is reasonably named. Glance names uploaded images by there image ID, which is a universally unique identifier (UUID), so let's follow Glance's precedent.

export IMAGE_ID=`uuidgen`
export POOL="glance-images"  # replace with your Glance pool name

qemu-img convert \
  -f qcow2 -O raw \
  my_cloud_image.raw \
  rbd:$POOL/$IMAGE_ID

Creating the clone baseline snapshot

Glance expects a snapshot named snap to exist on any image that is subsequently cloned by Cinder or Nova, so let's create that as well:

rbd snap create rbd:$POOL/$IMAGE_ID@snap
rbd snap protect rbd:$POOL/$IMAGE_ID@snap

Making Glance aware of the image

Finally, we can let Glance know about this image. Now, there's a catch to this: this trick only works with the Glance v1 API, and thus you must use the glance client to do it. Your Glance is v2 only? Sorry. Insist on using the openstack client? Out of luck.

What's special about this invocation of the glance client are simply the pre-populated location and id fields. The location is composed of the following segments:

  • the fixed string rbd://,
  • your Ceph cluster UUID (you get this from ceph fsid),
  • a forward slash (/),
  • the name of your image (which you previously created with uuidgen),
  • another forward slash (/, not @ as you might expect),
  • and finally, the name of your snapshot (snap).

Other than that, the glance client invocation is pretty straightforward for a v1 API call:

CLUSTER_ID=`ceph fsid`
glance --os-image-api-version 1 \
  image-create \
  --disk-format raw \
  --id $IMAGE_ID \
  --location rbd://$CLUSTER_ID/$IMAGE_ID/snap

Of course, you might add other options, like --private or --protected or --name, but the above options are the bare minimum.

And that's it!

Now you can happily fire up VMs, or clone your image into a volume and fire a VM up from that.

by hastexo at February 17, 2017 12:00 AM

February 16, 2017

Ed Leafe

Interop API Requirements

Lately the OpenStack Board of Directors and Technical Committee has placed a lot of emphasis on making OpenStack clouds from various providers “interoperable”. This is a very positive development, after years of different deployments adding various extensions and modifications to the upstream OpenStack code, which had made it hard to define just what it means to offer an “OpenStack Cloud”. So the Interop project (formerly known as DefCore) has been working for the past few years to create a series of objective tests that cloud deployers can run to verify that their cloud meets these interoperability standards.

As a member of the OpenStack API Working Group, though, I’ve had to think a lot about what interop means for an API. I’ll sum up my thoughts, and then try to explain why.

API Interoperability requires that all identical API calls return identical results when made to the same API version on all OpenStack clouds.

This may seem obvious enough, but it has implications that go beyond our current API guidelines. For example, we currently don’t recommend a version increase for changes that add things, such as an additional header or a new URL. After all, no one using the current version will be hurt by this, since they aren’t expecting those new things, and so their code cannot break. But this only considers the effect on a single cloud; when we factor in interoperability, things look very different.

Let’s consider the case where we have two OpenStack-based clouds, both running version 42 of an API. Cloud A is running the released version of the code, while Cloud B is tracking upstream master, which has recently added a new URL (which in the past we’ve said is OK). If we called that new URL on Cloud A, it will return a 404, since that URL had not been defined in the released version of the code. On Cloud B, however, since it is defined on the current code, it will return anything except a 404. So we have two clouds claiming to be running the same version of OpenStack, but making identical calls to them has very different results.

Note that when I say “identical” results, I mean structural things, such as response code, format of any body content, and response headers. I don’t mean that it will list the same resources, since it is expected that you can create different resources at will.

I’m sure this will be discussed further at next week’s PTG.

 

by ed at February 16, 2017 11:30 PM

Cloudwatt

5 Minutes Stacks, épisode 53 : iceScrum

Episode 53 : iceScrum

iceScrumlogo

iceScrum is a project management tool following “agile” method.

This tool will allow you to have a global preview of your project, and hence the analyses and the productivity.

A friendly dashboard shows useful indicators for the setting up of your project or the few last changes which were made.

iceScrum is fully available through an internet browser and it uses a MySQL database to store all its informations.

Preparations

The Versions

  • CoreOS Stable 1185.5
  • iceScrum R6#14.11

The prerequisites to deploy this stack

These should be routine by now:

Size of the instance

By default, the stack deploys on an instance of type “Standard 1” (n1.cw.standard-1). A variety of other instance types exist to suit your various needs, allowing you to pay only for the services you need. Instances are charged by the minute and capped at their monthly price (you can find more details on the Pricing page on the Cloudwatt website).

Stack parameters, of course, are yours to tweak at your fancy.

By the way…

If you do not like command lines, you can go directly to the “run it thru the console” section by clicking here

What will you find in the repository

Once you have cloned the github, you will find in the blueprint-coreos-icescrum/ repository:

  • blueprint-coreos-icescrum.heat.yml: HEAT orchestration template. It will be use to deploy the necessary infrastructure.
  • stack-start.sh: Stack launching script. This is a small script that will save you some copy-paste.
  • stack-get-url.sh: Flotting IP recovery script.

Start-up

Initialize the environment

Have your Cloudwatt credentials in hand and click HERE. If you are not logged in yet, you will go thru the authentication screen then the script download will start. Thanks to it, you will be able to initiate the shell accesses towards the Cloudwatt APIs.

Source the downloaded file in your shell. Your password will be requested.

$ source COMPUTE-[...]-openrc.sh
Please enter your OpenStack Password:

Once this done, the Openstack command line tools can interact with your Cloudwatt user account.

Adjust the parameters

With the blueprint-coreos-icescrum.heat.yml file, you will find at the top a section named parameters. The sole mandatory parameter to adjust is the one called keypair_name. Its default value must contain a valid keypair with regards to your Cloudwatt user account. This is within this same file that you can adjust the instance size by playing with the flavor parameter.

heat_template_version: 2015-04-30


description: Blueprint iceScrum


parameters:
  keypair_name:
    description: Keypair to inject in instance
    label: SSH Keypair
    type: string

  flavor_name:
    default: n1.cw.standard-1
    description: Flavor to use for the deployed instance
    type: string
    label: Instance Type (Flavor)
    constraints:
      - allowed_values:
          - n1.cw.standard-1
          - n1.cw.standard-2
          - n1.cw.standard-4
          - n1.cw.standard-8
          - n1.cw.standard-12
          - n1.cw.standard-16

  sqlpass:
    description: password root sql
    type: string
    hidden: true

[...]

Start stack

In a shell, run the script stack-start.sh with his name in parameter:

 ./stack-start.sh iceScrum
 +--------------------------------------+-----------------+--------------------+----------------------+
 | id                                   | stack_name      | stack_status       | creation_time        |
 +--------------------------------------+-----------------+--------------------+----------------------+
 | ee873a3a-a306-4127-8647-4bc80469cec4 | iceScrum        | CREATE_IN_PROGRESS | 2015-11-25T11:03:51Z |
 +--------------------------------------+-----------------+--------------------+----------------------+

Within 5 minutes the stack will be fully operational. (Use watch to see the status in real-time)

 $ watch heat resource-list iceScrum
 +------------------+-----------------------------------------------------+---------------------------------+-----------------+----------------------+
 | resource_name    | physical_resource_id                                | resource_type                   | resource_status | updated_time         |
 +------------------+-----------------------------------------------------+---------------------------------+-----------------+----------------------+
 | floating_ip      | 44dd841f-8570-4f02-a8cc-f21a125cc8aa                | OS::Neutron::FloatingIP         | CREATE_COMPLETE | 2015-11-25T11:03:51Z |
 | security_group   | efead2a2-c91b-470e-a234-58746da6ac22                | OS::Neutron::SecurityGroup      | CREATE_COMPLETE | 2015-11-25T11:03:52Z |
 | network          | 7e142d1b-f660-498d-961a-b03d0aee5cff                | OS::Neutron::Net                | CREATE_COMPLETE | 2015-11-25T11:03:56Z |
 | subnet           | 442b31bf-0d3e-406b-8d5f-7b1b6181a381                | OS::Neutron::Subnet             | CREATE_COMPLETE | 2015-11-25T11:03:57Z |
 | server           | f5b22d22-1cfe-41bb-9e30-4d089285e5e5                | OS::Nova::Server                | CREATE_COMPLETE | 2015-11-25T11:04:00Z |
 | floating_ip_link | 44dd841f-8570-4f02-a8cc-f21a125cc8aa-`floating IP`  | OS::Nova::FloatingIPAssociation | CREATE_COMPLETE | 2015-11-25T11:04:30Z |
   +------------------+-----------------------------------------------------+-------------------------------+-----------------+----------------------

The start-stack.sh script takes care of running the API necessary requests to execute the normal heat template which:

  • Starts an CoreOS based instance with the docker container iceScrum and the container MySQL
  • Expose it on the Internet via a floating IP.

All of this is fine, but…

You do not have a way to create the stack from the console?

We do indeed! Using the console, you can deploy iceScrum:

  1. Go the Cloudwatt Github in the applications/blueprint-coreos-icescrum repository
  2. Click on the file named blueprint-coreos-icescrum.heat.yml
  3. Click on RAW, a web page will appear containing purely the template
  4. Save the file to your PC. You can use the default name proposed by your browser (just remove the .txt)
  5. Go to the « Stacks » section of the console
  6. Click on « Launch stack », then « Template file » and select the file you just saved to your PC, and finally click on « NEXT »
  7. Name your stack in the « Stack name » field
  8. Enter the name of your keypair in the « SSH Keypair » field
  9. Write a passphrase that will be used for the database icescrum user
  10. Choose your instance size using the « Instance Type » dropdown and click on « LAUNCH »

The stack will be automatically generated (you can see its progress by clicking on its name). When all modules become green, the creation will be complete. You have to wait 5 minutes to the softwares be ready. You can then go to the “Instances” menu to find the floating IP, or simply refresh the current page and check the Overview tab for a handy link.

If you’ve reached this point, you’re already done! Go enjoy iceScrum!

A one-click deployment sounds really nice…

… Good! Go to the Apps page on the Cloudwatt website, choose the apps, press DEPLOY and follow the simple steps… 2 minutes later, a green button appears… ACCESS: you have your e-commerce platform.

Enjoy

Once all this makes you can connect on your server in SSH by using your keypair beforehand downloaded on your compute,

You are now in possession of iceScrum, you can enter via the URL http://ip-floatingip. Your full URL will be present in your stack overview in horizon Cloudwatt console.

At your first connexion you will ask to give some information and how to access to the database. Complete the fields as below (MySQL’s URL is jdbc:mysql://mysql:3306/icescrum?useUnicode=true&characterEncoding=utf8), the password is which one you chose when you created the stack.

firstco

When setup is completed, you have to restart iceScrum with the following command:docker restart icescrum

So watt?

The goal of this tutorial is to accelerate your start. At this point you are the master of the stack.

You now have an SSH access point on your virtual machine through the floating-IP and your private keypair (default userusername core).

  • You have access to the web interface via the address specified in your output stack in horizon console.

  • Here are some news sites to learn more:

    • https://www.icescrum.com/
    • https://www.icescrum.com/documentation/

Have fun. Hack in peace.

by Simon Decaestecker at February 16, 2017 11:00 PM

NFVPE @ Red Hat

Let’s spin up k8s 1.5 on CentOS (with CNI pod networking, too!)

Alright, so you’ve seen my blog post about installing Kubernetes by hand on CentOS, now… Let’s make that easier and do that with an Ansible playbook, specifically my kube-centos-ansible playbook. This time we’ll have Kubernetes 1.5 running on a cluster of 3 VMs, and we’ll use weave as a CNI plugin to handle our pod network. And to make it more fun, we’ll even expose some pods to the ‘outside world’, so we can actually (kinda) do something with them. Ready? Let’s go!

by Doug Smith at February 16, 2017 08:00 PM

OpenStack Superuser

Why commercial open source is like a voyage to mars: The Kubernetes story

LAKE TAHOE, CALIF. — Commercial open source is like the planet Mars: a fascinating environment that holds tremendous promise. It can also be harsh, sterile and difficult to navigate, say two veteran explorers.

Craig McLuckie, founder of Kubernetes and now CEO of Heptio, and Sarah Novotny currently leads the Kubernetes Community Program for Google. Kubernetes, an open source container cluster manager originally designed by Google in 2014, has been called a “secret weapon” in cloud computing.

The pair told a story about the emergence of “open source as a force in the world” as well as a cautionary tale for those embarking on a journey into the far reaches of commercial open source at the Linux Foundation’s Open Source leadership summit.

<figure class="wp-caption alignnone" id="attachment_5487" style="width: 605px">sarah <figcaption class="wp-caption-text"> Craig McLuckie and Sarah Novotny at the Linux Leadership Summit. </figcaption> </figure>

They first took a look at the current landscape: these days, software is increasingly central to the success of any business and open source software is changing the relationship between enterprises and technology. More progressive companies — including banks – are changing the way they engage with software, McLuckie says. They want to put resources into it to make it better, to make it theirs, behave the way they need to behave and that ripples into commercial buying decisions.

“You go to a lot of these organizations and increasingly they start to say, ‘Hey, this is open source’ and if it’s not, they’re not interested,” McLuckie says. “If you’re not an open source company, it’s hard times.”

Open source has also been the infrastructure that the internet has used for years and years but as the cloud has changed the infrastructure, and everything becomes a service, infrastructure software is being built in open source and deployed and monetized through cloud providers. “That really and fundamentally has changed how you engage with open source software and how you engage as open source developers,” Novotny says.

<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>

Cloud has changed the way open source software is monetized. When people ask McLuckie what was Google’s intent with Kubernetes, he answers, “We’re going to build something awesome. We’re going to monetize the heck out of hosting it. That’s it. That was the plan. It actually worked out really well.” The result was a strong community and a quickly growing product.

That impact is worth understanding, particularly if you’re running a community — doubly so if you’re building a company around an open source technology.

“We’re all in the open source world together,” he says, adding that there is “no finer mechanism for the manifestation of OS than Puppet cloud right now” and citing the example of RDS which is effectively a strategy to monetize MySQL. “It’s very difficult to point to something more successful than Amazon’s ability to mark up the cost of its infrastructure, overlay a technology like MySQL’s technology and then deliver a premium on it. This is incredibly powerful.”

Monetization is not without its challenges — “like going to Mars and staying alive when you get there,” McLuckie says. There’s an obvious tension in commercial open source between the need for control and the need to build and sustain an open ecosystem. “If you’re thinking about building a technology and building a business where your business is value extraction from the open source community, it’s going to be interesting. You’re going to have some interesting problems.”

McLuckie’s admittedly “non-scientific anecdata” theory is that the reality on return on effort for (distros, training, consulting) for many startups in open source can be “bleak.” The first way that startups tend to think about monetization is to create a distro and sell support. This is an easy way to get things rolling: the community finds it, they want to use it, very few people have the expertise to run it and you can make good money packaging it and selling it. But it doesn’t take long until another distro — almost identical — comes out at half the price. It’s easy for your customers to switch and the rest of your value proposition (getting the technology working, getting it through the community) is now making it easier for your competitor to undercut you. This is also true of licensing — where if your fees are high enough, companies start thinking internal engineers are cheaper and can customize more — professional services and training.

Survival is about being lean, McLuckie says. Smart startups can eke out a living if they don’t burn, say 50 percent of operating budget on customer aquisition and “look hard” for economies of scale.

How did Kubernetes navigate this territory? “We put together something that’s special. We created this back pressure, that will fight back from monetization,” McLuckie says . “The most focused way to do this is lock down a community. All organizations belong to us. If you want a fix, it’s coming through our computer base.”

Leadership matters, it’s a powerful and healthy form of control. The pair said that when they look at the biggest gap in their communities it’s not the contributors but leaders who can help contributors succeed. Leadership is also about the workaday tasks of managing releases, paying down the “community tax,” and being part of the team first.

“It’s easy to get on the wrong side of the community. It’s hard to get the unit economics right. Make sure you really think through the future of where this is going. And designing an organization, designing a good market strategy that’s grounded in the economics of the day,” he says.

“It’s also about being really smart,” Novotny says. “If you are a business, and you think you your business is value extraction from an open-source community. You’re going to have hard times. You cannot take more out of an open-source community that you put in.”

<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>

Be adaptable, she adds, ss the technology changes, as the landscape changes, as the cultural changes. “We’ve seen all of these cultural shifts, they all have threads that carry that, and now one of our favorite cultural shifts, of course, is Cloud Native. And that has such a strong expectation of being mobile. Mobile in the sense of not locked into any particular vendor, while still being able to get your- any you-cases service to the best possible execution of your engine. So my hope is that out of this, we will see open-source into across all of the very work that we need in our communities.

Above all, the key is to sell your vision of the future – new territories, unexplored lands.

“The technology is a tool,” McLuckie says. “If you want to create a business, the business has to be about how you use it. You have to sell the dream. You have to think about ways in which that technology is transforming in other people’s businesses.”

Cover photo by: Pascal

The post Why commercial open source is like a voyage to mars: The Kubernetes story appeared first on OpenStack Superuser.

by Nicole Martinelli at February 16, 2017 01:12 PM

Daniel P. Berrangé

Setting up a nested KVM guest for developing & testing PCI device assignment with NUMA

Over the past few years OpenStack Nova project has gained support for managing VM usage of NUMA, huge pages and PCI device assignment. One of the more challenging aspects of this is availability of hardware to develop and test against. In the ideal world it would be possible to emulate everything we need using KVM, enabling developers / test infrastructure to exercise the code without needing access to bare metal hardware supporting these features. KVM has long has support for emulating NUMA topology in guests, and guest OS can use huge pages inside the guest. What was missing were pieces around PCI device assignment, namely IOMMU support and the ability to associate NUMA nodes with PCI devices. Co-incidentally a QEMU community member was already working on providing emulation of the Intel IOMMU. I made a request to the Red Hat KVM team to fill in the other missing gap related to NUMA / PCI device association. To do this required writing code to emulate a PCI/PCI-E Expander Bridge (PXB) device, which provides a light weight host bridge that can be associated with a NUMA node. Individual PCI devices are then attached to this PXB instead of the main PCI host bridge, thus gaining affinity with a NUMA node. With this, it is now possible to configure a KVM guest such that it can be used as a virtual host to test NUMA, huge page and PCI device assignment integration. The only real outstanding gap is support for emulating some kind of SRIOV network device, but even without this, it is still possible to test most of the Nova PCI device assignment logic – we’re merely restricted to using physical functions, no virtual functions. This blog posts will describe how to configure such a virtual host.

First of all, this requires very new libvirt & QEMU to work, specifically you’ll want libvirt >= 2.3.0 and QEMU 2.7.0. We could technically support earlier QEMU versions too, but that’s pending on a patch to libvirt to deal with some command line syntax differences in QEMU for older versions. No currently released Fedora has new enough packages available, so even on Fedora 25, you must enable the “Virtualization Preview” repository on the physical host to try this out – F25 has new enough QEMU, so you just need a libvirt update.

# curl --output /etc/yum.repos.d/fedora-virt-preview.repo https://fedorapeople.org/groups/virt/virt-preview/fedora-virt-preview.repo
# dnf upgrade

For sake of illustration I’m using Fedora 25 as the OS inside the virtual guest, but any other Linux OS will do just fine. The initial task is to install guest with 8 GB of RAM & 8 CPUs using virt-install

# cd /var/lib/libvirt/images
# wget -O f25x86_64-boot.iso https://download.fedoraproject.org/pub/fedora/linux/releases/25/Server/x86_64/os/images/boot.iso
# virt-install --name f25x86_64  \
    --file /var/lib/libvirt/images/f25x86_64.img --file-size 20 \
    --cdrom f25x86_64-boot.iso --os-type fedora23 \
    --ram 8000 --vcpus 8 \
    ...

The guest needs to use host CPU passthrough to ensure the guest gets to see VMX, as well as other modern instructions and have 3 virtual NUMA nodes. The first guest NUMA node will have 4 CPUs and 4 GB of RAM, while the second and third NUMA nodes will each have 2 CPUs and 2 GB of RAM. We are just going to let the guest float freely across host NUMA nodes since we don’t care about performance for dev/test, but in production you would certainly pin each guest NUMA node to a distinct host NUMA node.

    ...
    --cpu host,cell0.id=0,cell0.cpus=0-3,cell0.memory=4096000,\
               cell1.id=1,cell1.cpus=4-5,cell1.memory=2048000,\
               cell2.id=2,cell2.cpus=6-7,cell2.memory=2048000 \
    ...

QEMU emulates various different chipsets and historically for x86, the default has been to emulate the ancient PIIX4 (it is 20+ years old dating from circa 1995). Unfortunately this is too ancient to be able to use the Intel IOMMU emulation with, so it is neccessary to tell QEMU to emulate the marginally less ancient chipset Q35 (it is only 9 years old, dating from 2007).

    ...
    --machine q35

The complete virt-install command line thus looks like

# virt-install --name f25x86_64  \
    --file /var/lib/libvirt/images/f25x86_64.img --file-size 20 \
    --cdrom f25x86_64-boot.iso --os-type fedora23 \
    --ram 8000 --vcpus 8 \
    --cpu host,cell0.id=0,cell0.cpus=0-3,cell0.memory=4096000,\
               cell1.id=1,cell1.cpus=4-5,cell1.memory=2048000,\
               cell2.id=2,cell2.cpus=6-7,cell2.memory=2048000 \
    --machine q35

Once the installation is completed, shut down this guest since it will be necessary to make a number of changes to the guest XML configuration to enable features that virt-install does not know about, using “virsh edit“. With the use of Q35, the guest XML should initially show three PCI controllers present, a “pcie-root”, a “dmi-to-pci-bridge” and a “pci-bridge”

<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1' model='dmi-to-pci-bridge'>
  <model name='i82801b11-bridge'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
</controller>
<controller type='pci' index='2' model='pci-bridge'>
  <model name='pci-bridge'/>
  <target chassisNr='2'/>
  <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</controller>

PCI endpoint devices are not themselves associated with NUMA nodes, rather the bus they are connected to has affinity. The default pcie-root is not associated with any NUMA node, but extra PCI-E Expander Bridge controllers can be added and associated with a NUMA node. So while in edit mode, add the following to the XML config

<controller type='pci' index='3' model='pcie-expander-bus'>
  <target busNr='180'>
    <node>0</node>
  </target>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</controller>
<controller type='pci' index='4' model='pcie-expander-bus'>
  <target busNr='200'>
    <node>1</node>
  </target>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</controller>
<controller type='pci' index='5' model='pcie-expander-bus'>
  <target busNr='220'>
    <node>2</node>
  </target>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</controller>

It is not possible to plug PCI endpoint devices directly into the PXB, so the next step is to add PCI-E root ports into each PXB – we’ll need one port per device to be added, so 9 ports in total. This is where the requirement for libvirt >= 2.3.0 – earlier versions mistakenly prevented you adding more than one root port to the PXB

<controller type='pci' index='6' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='6' port='0x0'/>
  <alias name='pci.6'/>
  <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</controller>
<controller type='pci' index='7' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='7' port='0x8'/>
  <alias name='pci.7'/>
  <address type='pci' domain='0x0000' bus='0x03' slot='0x01' function='0x0'/>
</controller>
<controller type='pci' index='8' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='8' port='0x10'/>
  <alias name='pci.8'/>
  <address type='pci' domain='0x0000' bus='0x03' slot='0x02' function='0x0'/>
</controller>
<controller type='pci' index='9' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='9' port='0x0'/>
  <alias name='pci.9'/>
  <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</controller>
<controller type='pci' index='10' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='10' port='0x8'/>
  <alias name='pci.10'/>
  <address type='pci' domain='0x0000' bus='0x04' slot='0x01' function='0x0'/>
</controller>
<controller type='pci' index='11' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='11' port='0x10'/>
  <alias name='pci.11'/>
  <address type='pci' domain='0x0000' bus='0x04' slot='0x02' function='0x0'/>
</controller>
<controller type='pci' index='12' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='12' port='0x0'/>
  <alias name='pci.12'/>
  <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</controller>
<controller type='pci' index='13' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='13' port='0x8'/>
  <alias name='pci.13'/>
  <address type='pci' domain='0x0000' bus='0x05' slot='0x01' function='0x0'/>
</controller>
<controller type='pci' index='14' model='pcie-root-port'>
  <model name='ioh3420'/>
  <target chassis='14' port='0x10'/>
  <alias name='pci.14'/>
  <address type='pci' domain='0x0000' bus='0x05' slot='0x02' function='0x0'/>|
</controller>

Notice that the values in ‘bus‘ attribute on the <address> element is matching the value of the ‘index‘ attribute on the <controller> element of the parent device in the topology. The PCI controller topology now looks like this

pcie-root (index == 0)
  |
  +- dmi-to-pci-bridge (index == 1)
  |    |
  |    +- pci-bridge (index == 2)
  |
  +- pcie-expander-bus (index == 3, numa node == 0)
  |    |
  |    +- pcie-root-port (index == 6)
  |    +- pcie-root-port (index == 7)
  |    +- pcie-root-port (index == 8)
  |
  +- pcie-expander-bus (index == 4, numa node == 1)
  |    |
  |    +- pcie-root-port (index == 9)
  |    +- pcie-root-port (index == 10)
  |    +- pcie-root-port (index == 11)
  |
  +- pcie-expander-bus (index == 5, numa node == 2)
       |
       +- pcie-root-port (index == 12)
       +- pcie-root-port (index == 13)
       +- pcie-root-port (index == 14)

All the existing devices are attached to the “pci-bridge” (the controller with index == 2). The devices we intend to use for PCI device assignment inside the virtual host will be attached to the new “pcie-root-port” controllers. We will provide 3 e1000 per NUMA node, so that’s 9 devices in total to add

<interface type='user'>
  <mac address='52:54:00:7e:6e:c6'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:c7'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:c8'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:d6'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:d7'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:d8'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:e6'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:e7'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
</interface>
<interface type='user'>
  <mac address='52:54:00:7e:6e:e8'/>
  <model type='e1000e'/>
  <address type='pci' domain='0x0000' bus='0x0e' slot='0x00' function='0x0'/>
</interface>

Note that we’re using the “user” networking, aka SLIRP. Normally one would never want to use SLIRP but we don’t care about actually sending traffic over these NICs, and so using SLIRP avoids polluting our real host with countless TAP devices.

The final configuration change is to simply add the Intel IOMMU device

<iommu model='intel'/>

It is a capability integrated into the chipset, so it does not need any <address> element of its own. At this point, save the config and start the guest once more. Use the “virsh domifaddrs” command to discover the IP address of the guest’s primary NIC and ssh into it.

# virsh domifaddr f25x86_64
 Name       MAC address          Protocol     Address
-------------------------------------------------------------------------------
 vnet0      52:54:00:10:26:7e    ipv4         192.168.122.3/24

# ssh root@192.168.122.3

We can now do some sanity check that everything visible in the guest matches what was enabled in the libvirt XML config in the host. For example, confirm the NUMA topology shows 3 nodes

# dnf install numactl
# numactl --hardware
available: 3 nodes (0-2)
node 0 cpus: 0 1 2 3
node 0 size: 3856 MB
node 0 free: 3730 MB
node 1 cpus: 4 5
node 1 size: 1969 MB
node 1 free: 1813 MB
node 2 cpus: 6 7
node 2 size: 1967 MB
node 2 free: 1832 MB
node distances:
node   0   1   2 
  0:  10  20  20 
  1:  20  10  20 
  2:  20  20  10 

Confirm that the PCI topology shows the three PCI-E Expander Bridge devices, each with three NICs attached

# lspci -t -v
-+-[0000:dc]-+-00.0-[dd]----00.0  Intel Corporation 82574L Gigabit Network Connection
 |           +-01.0-[de]----00.0  Intel Corporation 82574L Gigabit Network Connection
 |           \-02.0-[df]----00.0  Intel Corporation 82574L Gigabit Network Connection
 +-[0000:c8]-+-00.0-[c9]----00.0  Intel Corporation 82574L Gigabit Network Connection
 |           +-01.0-[ca]----00.0  Intel Corporation 82574L Gigabit Network Connection
 |           \-02.0-[cb]----00.0  Intel Corporation 82574L Gigabit Network Connection
 +-[0000:b4]-+-00.0-[b5]----00.0  Intel Corporation 82574L Gigabit Network Connection
 |           +-01.0-[b6]----00.0  Intel Corporation 82574L Gigabit Network Connection
 |           \-02.0-[b7]----00.0  Intel Corporation 82574L Gigabit Network Connection
 \-[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
             +-01.0  Red Hat, Inc. QXL paravirtual graphic card
             +-02.0  Red Hat, Inc. Device 000b
             +-03.0  Red Hat, Inc. Device 000b
             +-04.0  Red Hat, Inc. Device 000b
             +-1d.0  Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1
             +-1d.1  Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2
             +-1d.2  Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3
             +-1d.7  Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1
             +-1e.0-[01-02]----01.0-[02]--+-01.0  Red Hat, Inc Virtio network device
             |                            +-02.0  Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller
             |                            +-03.0  Red Hat, Inc Virtio console
             |                            +-04.0  Red Hat, Inc Virtio block device
             |                            \-05.0  Red Hat, Inc Virtio memory balloon
             +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
             +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
             \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

The IOMMU support will not be enabled yet as the kernel defaults to leaving it off. To enable it, we must update the kernel command line parameters with grub.

# vi /etc/default/grub
....add "intel_iommu=on"...
# grub2-mkconfig > /etc/grub2.cfg

While intel-iommu device in QEMU can do interrupt remapping, there is no way enable that feature via libvirt at this time. So we need to set a hack for vfio

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > \
  /etc/modprobe.d/vfio.conf

This is also a good time to install libvirt and KVM inside the guest

# dnf groupinstall "Virtualization"
# dnf install libvirt-client
# rm -f /etc/libvirt/qemu/networks/autostart/default.xml

Note we’re disabling the default libvirt network, since it’ll clash with the IP address range used by this guest. An alternative would be to edit the default.xml to change the IP subnet.

Now reboot the guest. When it comes back up, there should be a /dev/kvm device present in the guest.

# ls -al /dev/kvm
crw-rw-rw-. 1 root kvm 10, 232 Oct  4 12:14 /dev/kvm

If this is not the case, make sure the physical host has nested virtualization enabled for the “kvm-intel” or “kvm-amd” kernel modules.

The IOMMU should have been detected and activated

# dmesg  | grep -i DMAR
[    0.000000] ACPI: DMAR 0x000000007FFE2541 000048 (v01 BOCHS  BXPCDMAR 00000001 BXPC 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.203737] DMAR: Host address width 39
[    0.203739] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.203776] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 12008c22260206 ecap f02
[    2.910862] DMAR: No RMRR found
[    2.910863] DMAR: No ATSR found
[    2.914870] DMAR: dmar0: Using Queued invalidation
[    2.914924] DMAR: Setting RMRR:
[    2.914926] DMAR: Prepare 0-16MiB unity mapping for LPC
[    2.915039] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    2.915140] DMAR: Intel(R) Virtualization Technology for Directed I/O

The key message confirming everything is good is the last line there – if that’s missing something went wrong – don’t be mislead by the earlier “DMAR: IOMMU enabled” line which merely says the kernel saw the “intel_iommu=on” command line option.

The IOMMU should also have registered the PCI devices into various groups

# dmesg  | grep -i iommu  |grep device
[    2.915212] iommu: Adding device 0000:00:00.0 to group 0
[    2.915226] iommu: Adding device 0000:00:01.0 to group 1
...snip...
[    5.588723] iommu: Adding device 0000:b5:00.0 to group 14
[    5.588737] iommu: Adding device 0000:b6:00.0 to group 15
[    5.588751] iommu: Adding device 0000:b7:00.0 to group 16

Libvirt meanwhile should have detected all the PCI controllers/devices

# virsh nodedev-list --tree
computer
  |
  +- net_lo_00_00_00_00_00_00
  +- pci_0000_00_00_0
  +- pci_0000_00_01_0
  +- pci_0000_00_02_0
  +- pci_0000_00_03_0
  +- pci_0000_00_04_0
  +- pci_0000_00_1d_0
  |   |
  |   +- usb_usb2
  |       |
  |       +- usb_2_0_1_0
  |         
  +- pci_0000_00_1d_1
  |   |
  |   +- usb_usb3
  |       |
  |       +- usb_3_0_1_0
  |         
  +- pci_0000_00_1d_2
  |   |
  |   +- usb_usb4
  |       |
  |       +- usb_4_0_1_0
  |         
  +- pci_0000_00_1d_7
  |   |
  |   +- usb_usb1
  |       |
  |       +- usb_1_0_1_0
  |       +- usb_1_1
  |           |
  |           +- usb_1_1_1_0
  |             
  +- pci_0000_00_1e_0
  |   |
  |   +- pci_0000_01_01_0
  |       |
  |       +- pci_0000_02_01_0
  |       |   |
  |       |   +- net_enp2s1_52_54_00_10_26_7e
  |       |     
  |       +- pci_0000_02_02_0
  |       +- pci_0000_02_03_0
  |       +- pci_0000_02_04_0
  |       +- pci_0000_02_05_0
  |         
  +- pci_0000_00_1f_0
  +- pci_0000_00_1f_2
  |   |
  |   +- scsi_host0
  |   +- scsi_host1
  |   +- scsi_host2
  |   +- scsi_host3
  |   +- scsi_host4
  |   +- scsi_host5
  |     
  +- pci_0000_00_1f_3
  +- pci_0000_b4_00_0
  |   |
  |   +- pci_0000_b5_00_0
  |       |
  |       +- net_enp181s0_52_54_00_7e_6e_c6
  |         
  +- pci_0000_b4_01_0
  |   |
  |   +- pci_0000_b6_00_0
  |       |
  |       +- net_enp182s0_52_54_00_7e_6e_c7
  |         
  +- pci_0000_b4_02_0
  |   |
  |   +- pci_0000_b7_00_0
  |       |
  |       +- net_enp183s0_52_54_00_7e_6e_c8
  |         
  +- pci_0000_c8_00_0
  |   |
  |   +- pci_0000_c9_00_0
  |       |
  |       +- net_enp201s0_52_54_00_7e_6e_d6
  |         
  +- pci_0000_c8_01_0
  |   |
  |   +- pci_0000_ca_00_0
  |       |
  |       +- net_enp202s0_52_54_00_7e_6e_d7
  |         
  +- pci_0000_c8_02_0
  |   |
  |   +- pci_0000_cb_00_0
  |       |
  |       +- net_enp203s0_52_54_00_7e_6e_d8
  |         
  +- pci_0000_dc_00_0
  |   |
  |   +- pci_0000_dd_00_0
  |       |
  |       +- net_enp221s0_52_54_00_7e_6e_e6
  |         
  +- pci_0000_dc_01_0
  |   |
  |   +- pci_0000_de_00_0
  |       |
  |       +- net_enp222s0_52_54_00_7e_6e_e7
  |         
  +- pci_0000_dc_02_0
      |
      +- pci_0000_df_00_0
          |
          +- net_enp223s0_52_54_00_7e_6e_e8

And if you look at at specific PCI device, it should report the NUMA node it is associated with and the IOMMU group it is part of

# virsh nodedev-dumpxml pci_0000_df_00_0
<device>
  <name>pci_0000_df_00_0</name>
  <path>/sys/devices/pci0000:dc/0000:dc:02.0/0000:df:00.0</path>
  <parent>pci_0000_dc_02_0</parent>
  <driver>
    <name>e1000e</name>
  </driver>
  <capability type='pci'>
    <domain>0</domain>
    <bus>223</bus>
    <slot>0</slot>
    <function>0</function>
    <product id='0x10d3'>82574L Gigabit Network Connection</product>
    <vendor id='0x8086'>Intel Corporation</vendor>
    <iommuGroup number='10'>
      <address domain='0x0000' bus='0xdc' slot='0x02' function='0x0'/>
      <address domain='0x0000' bus='0xdf' slot='0x00' function='0x0'/>
    </iommuGroup>
    <numa node='2'/>
    <pci-express>
      <link validity='cap' port='0' speed='2.5' width='1'/>
      <link validity='sta' speed='2.5' width='1'/>
    </pci-express>
  </capability>
</device>

Finally, libvirt should also be reporting the NUMA topology

# virsh capabilities
...snip...
<topology>
  <cells num='3'>
    <cell id='0'>
      <memory unit='KiB'>4014464</memory>
      <pages unit='KiB' size='4'>1003616</pages>
      <pages unit='KiB' size='2048'>0</pages>
      <pages unit='KiB' size='1048576'>0</pages>
      <distances>
        <sibling id='0' value='10'/>
        <sibling id='1' value='20'/>
        <sibling id='2' value='20'/>
      </distances>
      <cpus num='4'>
        <cpu id='0' socket_id='0' core_id='0' siblings='0'/>
        <cpu id='1' socket_id='1' core_id='0' siblings='1'/>
        <cpu id='2' socket_id='2' core_id='0' siblings='2'/>
        <cpu id='3' socket_id='3' core_id='0' siblings='3'/>
      </cpus>
    </cell>
    <cell id='1'>
      <memory unit='KiB'>2016808</memory>
      <pages unit='KiB' size='4'>504202</pages>
      <pages unit='KiB' size='2048'>0</pages>
      <pages unit='KiB' size='1048576'>0</pages>
      <distances>
        <sibling id='0' value='20'/>
        <sibling id='1' value='10'/>
        <sibling id='2' value='20'/>
      </distances>
      <cpus num='2'>
        <cpu id='4' socket_id='4' core_id='0' siblings='4'/>
        <cpu id='5' socket_id='5' core_id='0' siblings='5'/>
      </cpus>
    </cell>
    <cell id='2'>
      <memory unit='KiB'>2014644</memory>
      <pages unit='KiB' size='4'>503661</pages>
      <pages unit='KiB' size='2048'>0</pages>
      <pages unit='KiB' size='1048576'>0</pages>
      <distances>
        <sibling id='0' value='20'/>
        <sibling id='1' value='20'/>
        <sibling id='2' value='10'/>
      </distances>
      <cpus num='2'>
        <cpu id='6' socket_id='6' core_id='0' siblings='6'/>
        <cpu id='7' socket_id='7' core_id='0' siblings='7'/>
      </cpus>
    </cell>
  </cells>
</topology>
...snip...

Everything should be ready and working at this point, so lets try and install a nested guest, and assign it one of the e1000e PCI devices. For simplicity we’ll just do the exact same install for the nested guest, as we used for the top level guest we’re currently running in. The only difference is that we’ll assign it a PCI device

# cd /var/lib/libvirt/images
# wget -O f25x86_64-boot.iso https://download.fedoraproject.org/pub/fedora/linux/releases/25/Server/x86_64/os/images/boot.iso
# virt-install --name f25x86_64 --ram 2000 --vcpus 8 \
    --file /var/lib/libvirt/images/f25x86_64.img --file-size 10 \
    --cdrom f25x86_64-boot.iso --os-type fedora23 \
    --hostdev pci_0000_df_00_0 --network none

If everything went well, you should now have a nested guest with an assigned PCI device attached to it.

This turned out to be a rather long blog posting, but this is not surprising as we’re experimenting with some cutting edge KVM features trying to emulate quite a complicated hardware setup, that deviates from normal KVM guest setup quite a way. Perhaps in the future virt-install will be able to simplify some of this, but at least for the short-medium term there’ll be a fair bit of work required. The positive thing though is that this has clearly demonstrated that KVM is now advanced enough that you can now reasonably expect to do development and testing of features like NUMA and PCI device assignment inside nested guests.

The next step is to convince someone to add QEMU emulation of an Intel SRIOV network device….volunteers please :-)

by Daniel Berrange at February 16, 2017 12:44 PM

ANNOUNCE: libosinfo 1.0.0 release

NB, this blog post was intended to be published back in November last year, but got forgotten in draft stage. Publishing now in case anyone missed the release…

I am happy to announce a new release of libosinfo, version 1.0.0 is now available, signed with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R). All historical releases are available from the project download page.

Changes in this release include:

  • Update loader to follow new layout for external database
  • Move all database files into separate osinfo-db package
  • Move osinfo-db-validate into osinfo-db-tools package

As promised, this release of libosinfo has completed the separation of the library code from the database files. There are now three independently released artefacts:

  • libosinfo – provides the libosinfo shared library and most associated command line tools
  • osinfo-db – contains only the database XML files and RNG schema, no code at all.
  • osinfo-db-tools – a set of command line tools for managing deployment of osinfo-db archives for vendors & users.

Before installing the 1.0.0 release of libosinfo it is necessary to install osinfo-db-tools, followed by osinfo-db. The download page has instructions for how to deploy the three components. In particular note that ‘osinfo-db’ does NOT contain any traditional build system, as the only files it contains are XML database files. So instead of unpacking the osinfo-db archive, use the osinfo-db-import tool to deploy it.

by Daniel Berrange at February 16, 2017 11:19 AM

Mirantis

Planning for OpenStack Summit Boston begins

The post Planning for OpenStack Summit Boston begins appeared first on Mirantis | Pure Play Open Cloud.

The next OpenStack summit will be held in Boston May 8 through May 11, 2017, and the agenda is in progress.  Mirantis folks, as well as some of our customers, have submitted talks, and we’d like to invite you to take a look, and perhaps to vote to show your support in this process.  The talks include:

While you’re in Boston, consider taking a little extra time in Beantown to take advantage of Mirantis Training’s special Pre-Summit training, which includes a bonus introduction module on the Mirantis Cloud Platform (MCP).  You’ll get to the summit up to speed with the technology, and even (if you pass the exam) the OCM100 OpenStack certification.  Can’t make it to Boston?  You can also take the pre-summit class live from the comfort of your own home (or office).

The post Planning for OpenStack Summit Boston begins appeared first on Mirantis | Pure Play Open Cloud.

by Nick Chase at February 16, 2017 03:25 AM

February 15, 2017

OpenStack Superuser

Security expert: open source must embrace working with the government or else

LAKE TAHOE, CALIF. — At a time when tech companies are locked in an awkward dance with the government, one security expert says that the open-source community must embrace working with lawmakers or face death by regulation.

”We’ve had this special right to code the world as we see fit,” says security guru Bruce Schneier, speaking at the Linux Foundation’s Open Source leadership summit. “My guess is that we’re going to lose this right because it’s too dangerous to give it to a bunch of techies.”

Up until now, he noted that the industry has left security to the market with decent results, but the tune has changed with the internet of things (IoT). Your connected car, toaster and thermostat and medical devices are turning the world into what amounts to a robot, says Schneier who appeared via Skype from a darkened room while attending the RSA, making his predictions about the future even more ominous.  This “world robot,” is not the Terminator-type sci-fi fans expect, but rather one without either a single goal or one creator.

<figure class="wp-caption alignnone" id="attachment_5449" style="width: 605px">bruce<figcaption class="wp-caption-text"> Bruce Schneier speaking via Skype at the Open Source leadership summit. // </figcaption> </figure>

“As everything becomes a computer, computer security becomes everything security,” he says. With iOT, the traditional paradigms of security are out of synch, sometimes with disastrous results. The paradigm where things are done right and properly first time (buildings, cars, medical devices) and the other (software) where the goal is to be agile and developers can always add patches and updates as vulnerabilities arise. “These two worlds are colliding (literally) now in things like automobiles, medical devices and e-voting.”

<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>

Your computer and phone are secure because there are teams of engineers at companies like Apple and Google working to make them secure, he said, holding up his own iPhone. With “smart” devices, there are often external teams who build libraries on the fly and then disband. You also replace your phone every two years which ensures updated security, but replace your car every 10 years, your refrigerator 25 years and your thermostat, well, never.

The effect is colossal: there is a fundamental difference between what happens when a spreadsheet crashes and a car or pacemaker crashes. From the standpoint of security professionals “it’s the same thing, for the rest of the world it’s not.”

<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>

That’s where he expects the government to come in. He predicts that the first line of intervention will be through the courts — most likely liabilities and tort law — with congress following.

“Nothing motivates the U.S. government like fear,” he says. So the open-source community must connect with lawmakers because there’s “smart government involvement and stupid government involvement. You can imagine a liability regime that would kill open source.”

His talk was in step with the earlier keynote by Jim Zemlin, the Linux Foundation’s executive director, who said that the cyber security should be at the forefront of everyone’s agenda.

<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>

Schneier made a plea for the open-source community to get involved with policy before it’s too late. He pitched the idea of an iOT security regulatory agency in the hopes of getting new expertise and control over the ever-shifting tech landscape.

“We build tech because it’s cool. We don’t design our future, we just see what happens. We need to make moral and ethical decisions about how we want to work.”

“This is a horribly contentious idea but my worry is that the alternatives aren’t viable any longer,” he said.

 

 

Cover photo: Chris Isherwood

The post Security expert: open source must embrace working with the government or else appeared first on OpenStack Superuser.

by Nicole Martinelli at February 15, 2017 01:19 PM

February 14, 2017

The Official Rackspace Blog

What is OpenStack? The Basics, Part 1

OpenStack. In an increasingly cloud-obsessed world, you’ve probably heard of it. Maybe you’ve read it’s “one of the fastest growing open source communities in the world,” but you’re still not sure what all the hype is about. The aim of this post is to get you from zero to 60 on the basics of OpenStack,

The post What is OpenStack? The Basics, Part 1 appeared first on The Official Rackspace Blog.

by Walter Bentley at February 14, 2017 06:58 PM

Dougal Matthews

Mistral on-success, on-error and on-complete

I spent a bit of time today looking into the subtleties of the Mistral task properties on-success, on-complete and on-error when used with the fail engine commands.

As an upcoming docs patch explains, these are similar to the Python try, except, finally blocks. Meaning, that it would look like the following.

try:
    action()
    # on-success
except:
    # on-error
finally:
    # on-complete

I was looking to see how the Mistral engine command would work in combination with these. In TripleO we want to mark a workflow as failed if it sends a Zaqar message with the value {"status": "FAILED"}. So our task would look a bit like this...

      send_message:
        action: zaqar.queue_post
        input:
          queue_name: <% $.queue_name %>
          messages:
            body:
              status: <% $.get('status', 'SUCCESS') %>
        on-complete:
          - fail: <% $.get('status') = "FAILED" %>

This task uses the zaqar.queue_post action to send a message containing the status. Once it is complete it will fail the workflow if the status is equal to "FAILED". Then in the mistral execution-list the workflow will show as failed. This is good, because we want to surface the best error in the execution list.

However, if the zaqar.queue_post action fails then we want to surface that error instead. At the moment it will still be possible to see it in the list of action executions. However, looking at the workflow executions it isn't obvious where the problem was.

Changing the above example to on-success solves that. We only want to manually mark the workflow as having failed if the Zaqar message was sent with the FAILED status. Otherwise if the message fails to send, the workflow will error anyway with a more detailed error.

by Dougal Matthews at February 14, 2017 04:35 PM

Mirantis

Introduction to Salt and SaltStack

The post Introduction to Salt and SaltStack appeared first on Mirantis | Pure Play Open Cloud.

image01The amazing world of configuration management software is really well populated these days. You may already have looked at Puppet, Chef or Ansible but today we focus on SaltStack. Simplicity is at its core, without any compromise on speed or scalability. In fact, some users have up to 10,000 minions or more. In this article, we’re going to give you a look at what Salt is and how it works.

Salt architecture

Salt remote execution is built on top of an event bus, which makes it unique. It uses a server-agent communication model where the server is called the salt master and the agents the salt minions.

Salt minions receive commands simultaneously from the master and contain everything required to execute commands locally and report back to salt master. Communication between master and minions happens over a high-performance data pipe that use ZeroMQ or raw TCP, and messages are serialized using MessagePack to enable fast and light network traffic. Salt uses public keys for authentication with the master daemon, then uses faster AES encryption for payload communication.

State description is done using YAML and remote execution is possible over a CLI, and programming or extending Salt isn’t a must.

Salt is heavily pluggable; each function can be replaced by a plugin implemented as a Python module. For example, you can replace the data store, the file server, authentication mechanism, even the state representation. So when I said state representation is done using YAML, I’m talking about the Salt default, which can be replaced by JSON, Jinja, Wempy, Mako, or Py Objects. But don’t freak out. Salt comes with default options for all these things, which enables you to jumpstart the system and customize it when the need arises.

image00Terminology

It’s easy to be overwhelmed by the obscure vocabulary that Salt introduces, so here are the main salt concepts which make it unique.

  • salt master – sends commands to minions
  • salt minions – receives commands from master
  • execution modules – ad hoc commands
  • grains – static information about minions
  • pillar – secure user-defined variables stored on master and assigned to minions (equivalent to data bags in Chef or Hiera in Puppet)
  • formulas (states) – representation of a system configuration, a grouping of one or more state files, possibly with pillar data and configuration files or anything else which defines a neat package for a particular application.
  • mine – area on the master where results from minion executed commands can be stored, such as the IP address of a backend webserver, which can then be used to configure a load balancer
  • top file – matches formulas and pillar data to minions
  • runners – modules executed on the master
  • returners – components that inject minion data to another system
  • renderers – components that run the template to produce the valid state of configuration files. The default renderer uses Jinja2 syntax and outputs YAML files.
  • reactor – component that triggers reactions on events
  • thorium – a new kind of reactor, which is still experimental.
  • beacons – a little piece of code on the minion that listens for events such as server failure or file changes. When it registers on of these events, it informs the master. Reactors are often used to do self healing.
  • proxy minions – components that translate Salt Language to device specific instructions in order to bring the device to the desired state using its API, or over SSH.
  • salt cloud – command to bootstrap cloud nodes
  • salt ssh – command to run commands on systems without minions

You’ll find a great overview of all of this on the official docs.

Installation

Salt is built on top of lots of Python modules. Msgpack, YAML, Jinja2, MarkupSafe, ZeroMQ, Tornado, PyCrypto and M2Crypto are all required. To keep your system clean, easily upgradable and to avoid conflicts, the easiest installation workflow is to use system packages.

Salt is operating system specific; in the examples in this article, I’ll be using Ubuntu 16.04 [Xenial Xerus]; for other Operating Systems consult the salt repo page.  For simplicity’s sake, you can install the master and the minion on a single machine, and that’s what we’ll be doing here.  Later, we’ll talk about how you can add additional minions.

  1. To install the master and the minion, execute the following commands:
    $ sudo su
    # apt-get update
    # apt-get upgrade
    # apt-get install curl wget
    # echo "deb [arch=amd64] http://apt.tcpcloud.eu/nightly xenial tcp-salt" > /etc/apt/sources.list
    # wget -O - http://apt.tcpcloud.eu/public.gpg | sudo apt-key add -
    # apt-get clean
    # apt-get update
    # apt-get install -y salt-master salt-minion reclass
  2. Finally, create the  directory where you’ll store your state files.
    # mkdir -p /srv/salt
    
  3. You should now have Salt installed on your system, so check to see if everything looks good:
    # salt --version

    You should see a result something like this:

    salt 2016.3.4 (Boron)

Alternative installations

If you can’t find packages for your distribution, you can rely on Salt Bootstrap, which is an alternative installation method, look below for further details.

Configuration

To finish your configuration, you’ll need to execute a few more steps:

  1. If you have firewalls in the way, make sure you open up both port 4505 (the publish port) and 4506 (the return port) to the Salt master to let the minions talk to it.
  2. Now you need to configure your Minion to connect to your master.  Edit the file /etc/salt/minion.d/minion.conf  and Change the following lines as indicated below:
    ...
    
    # Set the location of the salt master server. If the master server cannot be
    # resolved, then the minion will fail to start.
    master: localhost
    
    # If multiple masters are specified in the 'master' setting, the default behavior
    # is to always try to connect to them in the order they are listed. If random_master is
    # set to True, the order will be randomized instead. This can be helpful in distributing
    
    ...
    
    # Explicitly declare the id for this minion to use, if left commented the id
    # will be the hostname as returned by the python call: socket.getfqdn()
    # Since salt uses detached ids it is possible to run multiple minions on the
    # same machine but with different ids, this can be useful for salt compute
    # clusters.
    id: saltstack-m01
    
    # Append a domain to a hostname in the event that it does not exist.  This is
    # useful for systems where socket.getfqdn() does not actually result in a
    # FQDN (for instance, Solaris).
    #append_domain:
    ...

    As you can see, we’re telling the minion where to find the master so it can connect — in this case, it’s just localhost, but if that’s not the case for you, you’ll want to change it.  We’ve also given this particular minion an id of saltstack-m01; that’s a completely arbitrary name, so you can use whatever you want.  Just make sure to substitute in the examples!

  3. Before being able you can play around, you’ll need to restart the required Salt services to pick up the changes:
    # service salt-minion restart
    # service salt-master restart
  4. Make sure services are also started at boot time:
    # systemctl enable salt-master.service
    # systemctl enable salt-minion.service
  5. Before the master can do anything on the minion, the master needs to trust it, so accept the corresponding key of each of your minion as follows:
    # salt-key
     Accepted Keys:
     Denied Keys:
     Unaccepted Keys:
     saltstack-m01
     Rejected Keys:
  6. Before accepting it, you can validate it looks good. First inspect it:
    # salt-key -f saltstack-m01
     Unaccepted Keys:
     saltstack-m01:  98:f2:e1:9f:b2:b6:0e:fe:cb:70:cd:96:b0:37:51:d0
  7. Then compare it with the minion key:
    # salt-call --local key.finger
     local:
     98:f2:e1:9f:b2:b6:0e:fe:cb:70:cd:96:b0:37:51:d0
  8. It looks the same, so go ahead and accept it:/span>
    salt-key -a saltstack-m01

Repeat this process of installing salt-minion and accepting the keys to add new minions to your environment. Consult the documentation to get more details regarding the configuration of minions or more generally this documentation for all salt configuration options.

Remote execution

Now that everything’s installed and configured, let’s make sure it’s actually working. The first, most obvious thing we could do with our master/minion infrastructure is to run a command remotely. For example we can test whether the minion is alive by using the test.ping command:

# salt 'saltstack-m01' test.ping
 saltstack-m01:
     True

As you can see here, we’re calling salt, and we’re feeding it a specific minion, and a command to run on that minion.  We could, if we wanted to, send this command to more than one minion. For example, we could send it to all minions:

# salt '*' test.ping
 saltstack-m01:
     True

In this case, we have only one, but if there were more, salt would cycle through all of them giving you the appropriate response.

So that should get you started. Next time, we’ll look at some of the more complicated things you can do with Salt.

The post Introduction to Salt and SaltStack appeared first on Mirantis | Pure Play Open Cloud.

by Sebastian Braun at February 14, 2017 01:25 PM

OpenStack Superuser

Getting started with Kolla

I’ve been playing with Kolla for about a week, so I thought it’d be good to share my notes with the OpenStack operator community. (Kolla provides production-ready containers and deployment tools for operating OpenStack clouds, including Docker and Ansible.)

Up to stable/newton, Kolla was a single project that lives in the git repository:

In the current master (Ocata not yet released), Kolla is split into two repositories:

So in the current master, you won’t find the directory with the Ansible roles, because that directory is now in the new repository.

There is also a kolla-kubernetes repo, but I haven’t had the chance to look at that yet. I’ll work up a second part to this tutorial about that soon.

My first goal was to deploy OpenStack on top of OpenStack with Kolla. I will use SWITCHengines that is OpenStack Mitaka and I’ll try to deploy OpenStack Newton.

To get started, you need an Operator Seed node, the machine where you actually install Kolla, and from where you can run the kolla-ansible command.

I used Ubuntu Xenial for all my testing. Ubuntu does not yet have packages for Kolla. Instead of just installing with pip a lot of Python stuff and coming up with a deployment that is hard to reproduce, on #ubuntu-server I got this tip to use https://snapcraft.io.

There are already some OpenStack tools packaged with snapcraft:

I looked at what was already done, then I tried to package a snap for Kolla myself:

It worked quite fast, but I needed to write a couple of Kolla patches:

Also, because I had a lot of permission issues, I had to introduce this ugly patch to run all Ansible things as sudo:

In the beginning, I tried to fix it in a elegant way and add only where necessary the become: true, but my work collided with some one who was already working on that:

I hope that all these dirty workarounds will be gone by stable/ocata. Apart from these small glitches everything worked pretty well.

For Docker, I used this repo on Xenial: deb http://apt.dockerproject.org/repo ubuntu-xenial main

Understanding high availability

Kolla comes with HA built in. The key idea is to have as a front end two servers sharing a public VIP with VRRP protocol. These front-end nodes run HAProxy in active-backup mode. HAProxy then load-balances the requests for the API services and for DB and RabbitMQ to two or more controller nodes in the back end.

In the standard setup the front-end nodes are called network because they act also as Neutron network nodes. The nodes in the back end are called controllers.

Run the playbook

To get started, source your OpenStack config and get a tenant with enough quota and run this Ansible playbook:

cp vars.yaml.template vars.yaml
vim vars.yaml # add your custom config
export ANSIBLE_HOST_KEY_CHECKING=False
source ~/opestack-config
ansible-playbook main.yaml

The Ansible playbook will create the necessary VMs, will hack the /etc/hosts of all VMs so that they look all reachable to each other with names, and it will install Kolla on the operator-seed node using my snap package.

To have the frontend VMs share a VIP, I used the approach I found on this blog:

The playbook will configure all the OpenStack networking needed for our tests, and will configure Kolla on the operator node.

Now you can ssh to the operator node and start configuring Kolla. For this easy example, make sure that on /etc/kolla/passwords.yaml you have at least something written for the following values:

database_password:
rabbitmq_password:
rabbitmq_cluster_cookie:
haproxy_password:

If you want, you can also just type kolla-genpwd and this will enter passwords for all the fields in the file.

Now let’s get ready to run Ansible:

export ANSIBLE_HOST_KEY_CHECKING=False
kolla-ansible -i inventory/mariadb bootstrap-servers
kolla-ansible -i inventory/mariadb pull
kolla-ansible -i inventory/mariadb deploy

This example inventory that I have put at the path /home/ubuntu/inventory/mariadb is a very simplified inventory that will just deploy mariadb and rabbitmq. Check what I disabled in /etc/kolla/globals.yml

Check what is working

With the command:

openstack floating ip list | grep 192.168.22.22

You can check the public floating IP applied to the VIP. Check the OpenStack security groups applied to the front-end VMs. If the necessary ports are open you should be able to access the MySQL service on port 3306, and the HAProxy admin panel on port 1984. The passwords are the ones in the password.yml file and the username for HAProxy is openstack.

TODO

I will update this file with more steps 🙂 Pull requests are welcome!

 

Saverio Proto is a cloud engineer at SWITCH, a national research and education network in Switzerland, which runs a public cloud for national universities.

Superuser is always interested in community content, email: editor@superuser.org.

The post Getting started with Kolla appeared first on OpenStack Superuser.

by Saverio Proto at February 14, 2017 12:52 PM

February 13, 2017

David Moreau Simard

Announcing the ARA 0.11 release

We’re on the road to version 1.0.0 and we’re getting closer: introducing the release of version 0.11!

Four new contributors (!), 55 commits since 0.10 and 112 files changed for a total of 2,247 additions and 939 deletions.

New features, more stability, better documentation and better test coverage.

The changelog since 0.10.5

  • New feature: ARA UI and Ansible version (ARA UI is running with) are now shown at the top right
  • New feature: The Ansible version a playbook was run is now stored and displayed in the playbook reports
  • New feature: New command: “ara generate junit”: generates a junit xml stream of all task results
  • New feature: ara_record now supports two new types: “list” and “dict”, each rendered appropriately in the UI
  • UI: Add ARA logo and favicon
  • UI: Left navigation bar was removed (top navigation bar will be further improved in future versions)
  • Bugfix: CLI commands could sometimes fail when trying to format as JSON or YAML
  • Bugfix: Database and logs now properly default to ARA_DIR if ARA_DIR is changed
  • Bugfix: When using non-ascii characters (ex: äëö) in playbook files, web application or static generation could fail
  • Bugfix: Trying to use ara_record to record non strings (ex: lists or dicts) could fail
  • Bugfix: Ansible config: ‘tmppath’ is now a ‘type_value’ instead of a boolean
  • Deprecation: The “ara generate” command was deprecated and moved to “ara generate html”
  • Deprecation: The deprecated callback location, ara/callback has been removed. Use ara/plugins/callbacks.
  • Misc: Various unit and integration testing coverage improvements and optimization
  • Misc: Slowly started working on full python 3 compatibility
  • Misc: ARA now has a logo

ARA now has a logo !

Thanks Jason Rist for the contribution, really appreciate it !

With the icon: icon

Without the icon: full

Taking the newest version of ARA out for a spin

Want to give this new version a try ? It’s out on pypi!

Install dependencies and ARA, configure the Ansible callback location and ansible-playbook your stuff !

Once ARA has recorded your playbook, you’ll be able to fire off and browse the embedded server or generate a static version of the report.

The road ahead: version 1.0

What is coming in version 1.0 ? Let me ask you this question: what would you like in 1.0 ? The development of ARA has mostly been driven by it’s user’s needs and I’m really excited with what we already have.

I’d like to finish a few things before releasing 1.0… let’s take a sneak peek.

New web user interface

I’ve been working slowly but surely on a complete UI refactor, you can look at an early prototype preview here.

<iframe allowfullscreen="allowfullscreen" frameborder="0" height="480" src="https://www.youtube.com/embed/h3vY87_EWHw" width="853"></iframe>

Some ideas and concepts have evolved since then but the general idea is to try and display more information in less pages, while not going overboard and have your browser throw up due to the weight of the pages.

Some ARA users are running playbooks involving hundreds of hosts or thousands of tasks and it makes the static generation very slow, large and heavy. While I don’t think I’ll be able to make the static generation work well at any kind of scale, I think we can make this better.

There will have to be a certain point in terms of scale where users will be encouraged to leverage the dynamic web application instead.

Python 3 support

ARA isn’t gating against python3 right now and is actually failing unit tests when running python3.

As Ansible is working towards python3 support, ARA needs to be there too.

More complex use case support (stability/maturity)

There are some cases where it’s unclear if ARA works well or works at all. This is probably a matter of stability and maturity.

For example, ARA currently might not behave well when running concurrent ansible-playbook runs from the same node or if a remote database server happens to be on vacation.

More complex use case support might also mean providing users documentation on how to best leverage all the data that ARA records and provides: elasticsearch implementation, junit reports and so on.

If ARA is useful to you, I’d be happy to learn about your use case. Get in touch and let’s chat.

Implement support for ad-hoc ansible run logging

ARA will by default record anything and everything related to ansible-playbook runs. It also needs to support ad-hoc ansible commands as well. I want this before tagging 1.0.

Other features

There’s some other features I’d like to see make the cut for version 1.0:

  • Fully featured Ansible role for ARA
  • Store variables and extra variables
  • Provide some level of support for data on a role basis (filter tasks by role, metrics, duration, etc.)
  • Support generating a html or junit report for a specific playbook (rather than the whole thing)
  • Packaging for Debian/Ubuntu and Fedora/CentOS/RHEL

A stretch goal would be to re-write ARA to be properly split between client, server, UI and API — however I’m okay to let that slip for 2.0!

What else would you like to see in ARA ? Let me know in the comments, on IRC in #ara on freenode or on twitter!

by dmsimard at February 13, 2017 04:00 PM

RDO

RDO blogs, week of Feb 13

Here's what RDO enthusiasts have been blogging about in the last few weeks. If you blog about RDO, please let me know (rbowen@redhat.com) so I can add you to my list.

TripleO: Debugging Overcloud Deployment Failure by bregman

You run ‘openstack overcloud deploy’ and after a couple of minutes you find out it failed and if that’s not enough, then you open the deployment log just to find a very (very!) long output that doesn’t give you an clue as to why the deployment failed. In the following sections we’ll see how can […]

Read more at http://tm3.org/dv

RDO @ DevConf by Rich Bowen

It's been a very busy few weeks in the RDO travel schedule, and we wanted to share some photos with you from RDO's booth at DevConf.cz.

Read more at http://tm3.org/dw

The surprisingly complicated world of disk image sizes by Daniel Berrange

When managing virtual machines one of the key tasks is to understand the utilization of resources being consumed, whether RAM, CPU, network or storage. This post will examine different aspects of managing storage when using file based disk images, as opposed to block storage. When provisioning a virtual machine the tenant user will have an idea of the amount of storage they wish the guest operating system to see for their virtual disks. This is the easy part. It is simply a matter of telling ‘qemu-img’ (or a similar tool) ’40GB’ and it will create a virtual disk image that is visible to the guest OS as a 40GB volume. The virtualization host administrator, however, doesn’t particularly care about what size the guest OS sees. They are instead interested in how much space is (or will be) consumed in the host filesystem storing the image. With this in mind, there are four key figures to consider when managing storage:

Read more at http://tm3.org/dx

Project Leader by rbowen

I was recently asked to write something about the project that I work on – RDO – and one of the questions that was asked was:

Read more at http://tm3.org/dy

os_type property for Windows images on KVM by Tim Bell

The OpenStack images have a long list of properties which can set to describe the image meta data. The full list is described in the documentation. This blog reviews some of these settings for Windows guests running on KVM, in particular for Windows 7 and Windows 2008R2.

Read more at http://tm3.org/dz

Commenting out XML snippets in libvirt guest config by stashing it as metadata by Daniel Berrange

Libvirt uses XML as the format for configuring objects it manages, including virtual machines. Sometimes when debugging / developing it is desirable to comment out sections of the virtual machine configuration to test some idea. For example, one might want to temporarily remove a secondary disk. It is not always desirable to just delete the configuration entirely, as it may need to be re-added immediately after. XML has support for comments which one might try to use to achieve this. Using comments in XML fed into libvirt, however, will result in an unwelcome suprise – the commented out text is thrown into /dev/null by libvirt.

Read more at http://tm3.org/d-

Videos from the CentOS Dojo, Brussels, 2017 by Rich Bowen

Last Friday in Brussels, CentOS enthusiasts gathered for the annual CentOS Dojo, right before FOSDEM.

Read more at http://tm3.org/dp

FOSDEM Day 0 - CentOS Dojo by Rich Bowen

FOSDEM starts tomorrow in Brussels, but there's always a number of events the day before.

Read more at http://tm3.org/dq

Gnocchi 3.1 unleashed by Julien Danjou

It's always difficult to know when to release, and we really wanted to do it earlier. But it seems that each week more awesome work was being done in Gnocchi, so we kept delaying it while having no pressure to push it out.

Read more at http://tm3.org/dr

Testing RDO with Tempest: new features in Ocata by ltoscano

The release of Ocata, with its shorter release cycle, is close and it is time to start a broader testing (even if one could argue that it is always time for testing!).

Read more at http://tm3.org/ds

Barely Functional Keystone Deployment with Docker by Adam Young

My eventual goal is to deploy Keystone using Kubernetes. However, I want to understand things from the lowest level on up. Since Kubernetes will be driving Docker for my deployment, I wanted to get things working for a single node Docker deployment before I move on to Kubernetes. As such, you’ll notice I took a few short cuts. Mostly, these involve configuration changes. Since I will need to use Kubernetes for deployment and configuration, I’ll postpone doing it right until I get to that layer. With that caveat, let’s begin.

Read more at http://tm3.org/dt

by Rich Bowen at February 13, 2017 03:35 PM

Gorka Eguileor

iSCSI multipath issues in OpenStack

Multipathing is a technique frequently used in enterprise deployments to increase throughput and reliability on external storage connections, and it’s been a little bit of a pain in the neck for OpenStack users. If you’ve nodded while reading the previous statement, then this post will probably be of interest to you, as we’ll be going […]

by geguileo at February 13, 2017 03:05 PM

OpenStack Superuser

What’s new in the world of OpenStack Ambassadors

The OpenStack Ambassador Program is excited to welcome two new volunteers, Lisa-Marie Namphy and Ilya Alekseyev.

Ambassadors act as liaisons between multiple User Groups, the Foundation and the community in their regions. Launched in 2013, the OpenStack Ambassador program aims to create a framework of community leaders to sustainably expand the reach of OpenStack around the world.

Namphy will be looking after the United States. She first joined the OpenStack community in 2012, while leading the product marketing team for the OpenStack technology initiative at Hewlett-Packard. Later, she became the San Francisco Bay Area OpenStack User Group organizer and has been running it for the past three years.

Prior to becoming an Ambassador, she has made considerable contributions to the OpenStack community with a book published on OpenStack, speaking sessions at seven Summits, five OpenStack Days and recorded dozens of video interviews on OpenStack. She have also taken part in building the SF Bay Area OpenStack community to nearly 6,000 members.

Namphy tells Superuser that she believes 2017 will be the year of OpenStack adoption. Her goal is to encourage user groups and their organizers to make this a priority initiative as they build their communities. Furthermore, she hopes to mentor fellow community leaders in creating robust communities like the San Francisco Bay chapter.

Alekseyev will be looking after the Russia and the Commonwealth of Independent States. He started working with OpenStack in December 2010, when he made proof-of-concepts for potential customers at Grid Dynamics, also working with a team to contribute to Nova. In addition, he has coordinated the Russian translation team. Alekseyev also helped launch and organize meetups and conferences devoted to OpenStack in Russia and creating User Groups in Moscow, St. Petersburg, Kazan and Kazakhstan.

Alekseyev’s goals include to further develop user groups in his region and help them meet the official User Group requirements. In addition, he hopes to organize OpenStack days in his region and facilitate relationships with local universities, other open source and cloud communities, which he hopes will engage new users. Lastly, he will continue his work on promoting OpenStack resources in Russian for the Russian-speaking community.

We’re very excited to have them on board. We also bid a farewell to two valued members of our Ambassador team, Kenneth Hui and Sean Roberts. Both of them achieved fantastic things while with us.

Kenneth initiated and mentored the Philadelphia, Maryland and Florida user groups, while also growing the New York City user group to become one of the longest running. In addition, Kenneth contributed to organising the OpenStack Architecture Guide book sprint as well as representing the OpenStack project at various conferences and meet-ups.

Roberts has long been a vocal participant in our meetings and along with Namphy stewarded SFBay, one of the largest OpenStack user groups.

We are grateful for their work and the legacy they will leave for our newcomers to build upon.

If you’re interested in becoming an ambassador, you can apply here:
https://docs.google.com/forms/d/e/1FAIpQLSc0TWAPr32S5CNh0mvYyYH4vZY-rEF6bWUZ9KhwMbXjq3jovw/viewform

Sonia Ramez recently joined the OpenStack Foundation on the community management team as an intern. She’s working on the user group process and the Ambassador Program.

The post What’s new in the world of OpenStack Ambassadors appeared first on OpenStack Superuser.

by Sonia Ramza at February 13, 2017 12:30 PM

Hugh Blemings

Lwood-20170212

Introduction

Welcome to Last week on OpenStack Dev (“Lwood”) for the week just past. For more background on Lwood, please refer here.

Basic Stats for the week 6 to 12 February for openstack-dev:

~348 Messages (down about 39% relative to the long term average)

~124 Unique threads (down about 31% relative to the long term average)

One of those weeks where I wonder if should ever speculate what is going to happen with traffic on the list!  Much quieter this week relative to average – there does seem to be a trend where traffic falls away a bit around a PTG or Summit so perhaps just a side effect of the proximity to next weeks PTG in Boston.  Bit of a shorter Lwood as a result

Notable Discussions – openstack-dev

New OpenStack Security Notices

Users of Glance may be able to replace active image data [OSSN-0065]

From the summary: “When Glance has been configured with the “show_multiple_locations” option enabled with default policy for set and delete locations, it is possible for a non-admin user having write access to the image metadata to replace active image data.”

What is your favourite/most embarrassing IRC gaffe ?

So asks Kendall Nelson in his email – he’s gathering stories from the community as part of an article he’s writing. In fairness I won’t risk inadvertently stealing his thunder by repeating or summarising the stories here but if you want something to brighten your morning/afternoon, have a quick peek at the thread :)

End of Week Wrap-ups, Summaries and Updates

People and Projects

Project Team Lead Election Conclusion and Results

Kendall Nelson summarises the results of the recent PTL elections in a post to the list.  Most Projects had the one PTL nominee, those that went to election were Ironic, Keyston, Neutron, QA and Stable Branch Maintenance.  Full details in Kendall’s message.

Core nominations & changes

A quiet week this week other than the PTL elections winding up

  • [Dragonflow] Nominating Xiao Hong Hui for core of Dragonflow – Omer Anson

Miscellanea

Further reading

Don’t forget these excellent sources of OpenStack news – most recent ones linked in each case

Credits

This weeks edition of Lwood brought to you by Bruce Hornsby (Scenes from the Southside) and Bruce Springsteen (Greatest Hits).

In this my first Lwood post Rackspace I place on record my thanks to the Rack for a great few years and, of course, for supporting producing Lwood as part of my role there. I intend continuing to write Lwood for the foreseeable future modulo what my new (yet to be determined) gig might entail :)

 

by hugh at February 13, 2017 08:16 AM

February 12, 2017

Flavio Percoco

On communities: When should change happen?

One common rule of engineering (and not only engineering, really) is that you don't change something that is not broken. In this context, broken doesn't only refer to totally broken things. It could refer to a piece of code becoming obsolete, or a part of the software not performing well anymore, etc. The point is that it doesn't matter how sexy the change you want to make is, if there's no good reason to make it, then don't. Because the moment you do, you'll break what isn't broken (or known to be broken, at the very least).

Good practices are good for some things, not everything and even the one mentioned above is not an exception. Trying to apply this practice to everything in our lives and everywhere in our jobs is not going to bring the results one would expect. We will soon end up with stalled processes or even worse, as it's the case for communities, we may be dictating the death of the thing we are applying this practice on.

When it comes to communities, I am a strong believer that the sooner we try to improve things, the more we will avoid future issues that could damage our community. If we know there are things that can be improved and we don't do it because there are no signs of the community being broken, we will, in most cases, be damaging the community. Hopefully the example below will help understanding the point I'm making.

Take OpenStack as an example. It's a fully distributed community with people from pretty much everywhere in the world. What this really means is that there are people from different cultures, whose first language is not English, that live in different timezones. One common issues with every new team in OpenStack is finding the best way to communicate across the team. Should the team use IRC? Should the team try video first? Should the team do both? What time is the best time to meet? etc.

The defacto standard mean of communication for instant messaging in OpenStack is IRC. It's accessible from everywhere, it's written, it's logged and it's open. It has been around for ages and it has been used by the community since the very beginning. Some teams, however, have chosen video over IRC because it's just faster. The amount of things that can be covered in a 1h-long call are normally more than the ones covered in a 1h-long IRC meeting. For some people it's just easier and faster to talk. For some people. Not everyone, just some people. The community is distributed and diverse, remember?

Now, without getting into the details of whether IRC is better than video calls, let's assume a new (or existing team) decides to start doing video calls. Let's also assume that the technology used is accessible everywhere (no G+ because it is blocked in China, for example) and that the video calls are recorded and made public. For the current size and members of the hypothetical team, video calls are ok. Members feel comfortable and they can all attend at a reasonable time. Technically, there's nothing broken with this setup. Technically, the team could keep using video calls until something happens, until someone actually complains, until something breaks.

This is exactly where problems begin. In a multi-cultural environment we ought to consider that not everyone is used to speaking up and complaining. While I agree the best way to improve a community is by people speaking up, we also have to take into account those who don't do it because they are just not used to it. Based on the scenario described above, these folks are still not part of the project's team and they likely won't be because in order for them to participate in the community, they would have to give up part of who they are.

For the sake of discussion, let's assume that these folks can attend the call but they are not native English speakers. At this point the problem becomes the language barrier. The language barrier is always higher than your level of extroversion. Meaning, you can be a very extrovert person but not being able to speak the language fluently will leave you off of some discussions, which will likely end up in frustration. Written forms of expression are easier than spoken ones. Our brain has more time to process them, reason about them and apply/correct the login before it even tries to come out of our fingers. The same is not true for spoken communication.

I don't want to get too hung up on the video vs IRC discussion, to be honest. The point made is that, when it comes to communities, waiting for people to complain, or for things to be broken, is the wrong approach. Sit down and reflect how you can make the community better, what things are slowing down its growth and what changes would help you be more inclusive. Waiting until there is an actual problem may be the death of your community. The last thing you want to do is to drive the wrong people away.

If you liked this post, you may also like:

by Flavio Percoco at February 12, 2017 11:00 PM

Arie Bregman

TripleO: Debugging Overcloud Deployment Failure

You run ‘openstack overcloud deploy’ and after a couple of minutes you find out it failed and if that’s not enough, then you open the deployment log just to find a very (very!) long output that doesn’t give you an clue as to why the deployment failed. In the following sections we’ll see how can […]

by bregman at February 12, 2017 08:45 PM

February 10, 2017

RDO

RDO @ DevConf

It's been a very busy few weeks in the RDO travel schedule, and we wanted to share some photos with you from RDO's booth at DevConf.cz.

RDO @ DevConf<script async="" charset="utf-8" src="https://embedr.flickr.com/assets/client-code.js"></script>

Led by Eliska Malikova, and supported by our team of RDO engineers, we provided information about RDO and OpenStack, as well as a few impromptu musical performances.

RDO @ DevConf<script async="" charset="utf-8" src="https://embedr.flickr.com/assets/client-code.js"></script>

RDO engineers spun up a small RDO cloud, and later in the day, the people from the Manage IQ booth next door set up an instance of their software to manage that cloud, showing that RDO and Manage IQ are better together.

You can see the full album of photos on Flickr.

If you have photos or stories from DevConf, please share them with us on rdo-list. Thanks!

by Rich Bowen at February 10, 2017 09:10 PM

Daniel P. Berrangé

The surprisingly complicated world of disk image sizes

When managing virtual machines one of the key tasks is to understand the utilization of resources being consumed, whether RAM, CPU, network or storage. This post will examine different aspects of managing storage when using file based disk images, as opposed to block storage. When provisioning a virtual machine the tenant user will have an idea of the amount of storage they wish the guest operating system to see for their virtual disks. This is the easy part. It is simply a matter of telling ‘qemu-img’ (or a similar tool) ’40GB’ and it will create a virtual disk image that is visible to the guest OS as a 40GB volume. The virtualization host administrator, however, doesn’t particularly care about what size the guest OS sees. They are instead interested in how much space is (or will be) consumed in the host filesystem storing the image. With this in mind, there are four key figures to consider when managing storage:

  • Capacity – the size that is visible to the guest OS
  • Length – the current highest byte offset in the file.
  • Allocation – the amount of storage that is currently consumed.
  • Commitment – the amount of storage that could be consumed in the future.

The relationship between these figures will vary according to the format of the disk image file being used. For the sake of illustration, raw and qcow2 files will be compared since they provide an examples of the simplest file format and the most complicated file format used for virtual machines.

Raw files

In a raw file, the sectors visible to the guest are mapped 1-2-1 onto sectors in the host file. Thus the capacity and length values will always be identical for raw files – the length dictates the capacity and vica-verca. The allocation value is slightly more complicated. Most filesystems do lazy allocation on blocks, so even if a file is 10 GB in length it is entirely possible for it to consume 0 bytes of physical storage, if nothing has been written to the file yet. Such a file is known as “sparse” or is said to have “holes” in its allocation. To maximize guest performance, it is common to tell the operating system to fully allocate a file at time of creation, either by writing zeros to every block (very slow) or via a special system call to instruct it to immediately allocate all blocks (very fast). So immediately after creating a new raw file, the allocation would typically either match the length, or be zero. In the latter case, as the guest writes to various disk sectors, the allocation of the raw file will grow. The commitment value refers the upper bound for the allocation value, and for raw files, this will match the length of the file.

While raw files look reasonably straightforward, some filesystems can create surprises. XFS has a concept of “speculative preallocation” where it may allocate more blocks than are actually needed to satisfy the current I/O operation. This is useful for files which are progressively growing, since it is faster to allocate 10 blocks all at once, than to allocate 10 blocks individually. So while a raw file’s allocation will usually never exceed the length, if XFS has speculatively preallocated extra blocks, it is possible for the allocation to exceed the length. The excess is usually pretty small though – bytes or KBs, not MBs. Btrfs meanwhile has a concept of “copy on write” whereby multiple files can initially share allocated blocks and when one file is written, it will take a private copy of the blocks written. IOW, to determine the usage of a set of files it is not sufficient sum the allocation for each file as that would over-count the true allocation due to block sharing.

QCow2 files

In a qcow2 file, the sectors visible to the guest are indirectly mapped to sectors in the host file via a number of lookup tables. A sector at offset 4096 in the guest, may be stored at offset 65536 in the host. In order to perform this mapping, there are various auxiliary data structures stored in the qcow2 file. Describing all of these structures is beyond the scope of this, read the specification instead. The key point is that, unlike raw files, the length of the file in the host has no relation to the capacity seen in the guest. The capacity is determined by a value stored in the file header metadata. By default, the qcow2 file will grow on demand, so the length of the file will gradually grow as more data is stored. It is possible to request preallocation, either just of file metadata, or of the full file payload too. Since the file grows on demand as data is written, traditionally it would never have any holes in it, so the allocation would always match the length (the previous caveat wrt to XFS speculative preallocation still applies though). Since the introduction of SSDs, however, the notion of explicitly cutting holes in files has become commonplace. When this is plumbed through from the guest, a guest initiated TRIM request, will in turn create a hole in the qcow2 file, which will also issue a TRIM to the underlying host storage. Thus even though qcow2 files are grow on demand, they may also become sparse over time, thus allocation may be less than the length. The maximum commitment for a qcow2 file is surprisingly hard to get an accurate answer to. To calculate it requires intimate knowledge of the qcow2 file format and even the type of data stored in it. There is allocation overhead from the data structures used to map guest sectors to host file offsets, which is directly proportional to the capacity and the qcow2 cluster size (a cluster is the qcow2 equivalent “sector” concept, except much bigger – 65536 bytes by default). Over time qcow2 has grown other data structures though, such as various bitmap tables tracking cluster allocation and recent writes. With the addition of LUKS support, there will be key data tables. Most significantly though is that qcow2 can internally store entire VM snapshots containing the virtual device state, guest RAM and copy-on-write disk sectors. If snapshots are ignored, it is possible to calculate a value for the commitment, and it will be proportional to the capacity. If snapshots are used, however, all bets are off – the amount of storage that can be consumed is unbounded, so there is no commitment value that can be accurately calculated.

Summary

Considering the above information, for a newly created file the four size values would look like

Format Capacity Length Allocation Commitment
raw (sparse) 40GB 40GB 0 40GB [1]</a.>
raw (prealloc) 40GB 40GB 40GB [1] 40GB [1]
qcow2 (grow on demand) 40GB 193KB 196KB 41GB [2]
qcow2 (prealloc metadata) 40GB 41GB 6.5MB 41GB [2]
qcow2 (prealloc all) 40GB 41GB 41GB 41GB [2]
[1] XFS speculative preallocation may cause allocation/commitment to be very slightly higher than 40GB
[2] use of internal snapshots may massively increase allocation/commitment

For an application attempting to manage filesystem storage to ensure any future guest OS write will always succeed without triggering ENOSPC (out of space) in the host, the commitment value is critical to understand. If the length/allocation values are initially less than the commitment, they will grow towards it as the guest writes data. For raw files it is easy to determine commitment (XFS preallocation aside), but for qcow2 files it is unreasonably hard. Even ignoring internal snapshots, there is no API provided by libvirt that reports this value, nor is it exposed by QEMU or its tools. Determining the commitment for a qcow2 file requires the application to not only understand the qcow2 file format, but also directly query the header metadata to read internal parameters such as “cluster size” to be able to then calculate the required value. Without this, the best an application can do is to guess – e.g. add 2% to the capacity of the qcow2 file to determine likely commitment. Snapshots may life even harder, but to be fair, qcow2 internal snapshots are best avoided regardless in favour of external snapshots. The lack of information around file commitment is a clear gap that needs addressing in both libvirt and QEMU.

That all said, ensuring the sum of commitment values across disk images is within the filesystem free space is only one approach to managing storage. These days QEMU has the ability to live migrate virtual machines even when their disks are on host-local storage – it simply copies across the disk image contents too. So a valid approach is to mostly ignore future commitment implied by disk images, and instead just focus on the near term usage. For example, regularly monitor filesystem usage and if free space drops below some threshold, then migrate one or more VMs (and their disk images) off to another host to free up space for remaining VMs.

by Daniel Berrange at February 10, 2017 03:58 PM

OpenStack Superuser

From zero to hero: Your first week as an OpenStack contributor

Each year, hundreds of new contributors start working on OpenStack. Most OpenStack projects have mature code bases and contributors who have been developing the code for several years. Ensuring that a new contributor is pointed in the right direction can often be hard work and a little time consuming.

When a newbie asks (a project team lead, jumps in on the mailing list, pipes up over IRC) about how to contribute, the seasoned Stacker will often send them straight to OpenStack Manuals. Why? Because the documentation contribution process is identical to the code contribution process, making it an ideal place to start.

The OpenStack manuals project develops key introductory installation, operation and administration documentation for OpenStack projects. The manuals are a great place to start and provide an invaluable window into each project and how they are operated. This enables the contributor to become familiar with the Git and Gerrit workflow and to feel confident reviewing, responding and reacting to patches and bugs without feeling like they are breaking code lines.

So, from the documentation team to you, here are the Day 0 to Day 5 tips (okay, we’ll be honest, this might work out to more than five days, so take your time!) and links to get you set up during your first week. They’ll help to ensure that by the end of the week, you can feel (and tell your boss!) that you are an OpenStack contribution guru.

Scenario: You’ve been told to get started working on OpenStack, ramp up your contributions and start the journey to becoming a core in a project.

  • If you have no idea what this means, or entails, start at Day 0.
  • If you understand the concepts, but want to know how to get more involved, start at Day 1.image00

Day 0:

The OpenStack manuals project provides documentation for various OpenStack projects to promote OpenStack and to develop and maintain tools and processes to ensure the quality and accuracy of documentation.

Our team structure is the same as any other OpenStack project. There is a Project Technical Lead (PTL) who ensures that projects and individual tasks are completed and looks after the individual contributor’s requirements, if any. The PTL is the community manager for this project.

A team of core reviewers work with the PTL. Core reviewers are able to +2 and merge content into the projects for which they have core status. Core status is granted to those who have not only shown care and wisdom in their reviews but have also done a sufficient quantity of reviews and commits.

The OpenStack manuals team looks after the repositories listed here.

There are no restrictions on who can submit a patch to the OpenStack manuals. If you are looking to get started, the Contributor Guide is your source for all information.

To begin contributing, use the First Timers section to set up your accounts, Git and Gerrit. We treat documentation like code, so we have the same process as development. We also recommend that you join the documentation mailing list (and introduce yourself with a few lines) and join the #openstack-doc IRC channel on Freenode.

Day 1:

You have successfully setup your Git and Gerrit workflow and you know what commands to execute to submit a patch for review.

We recommend tackling a low-hanging-fruit bug. These bugs have been triaged by the documentation team and have been designated a “quick fix.” The issue should be described within the bug report or within the comments. However, if you do not believe you understand the bug you can do one of the following:

  • First, join the OpenStack Documentation Bug team. Set the status of the bug as “Incomplete” and ask the reporter to provide more information.
  • If the bug addresses a problem with project specific documentation, contact the Documentation Liaison for the specific project.

Our documentation is written in RST, so ensure that you have an appropriate text editor to assist with your formatting and syntax. You can also find the guidelines for the general writing style that all documentation contributors should follow to ensure consistency throughout all technical publications.

From here, you can either patch the bug and apply the fix, based on the workflow described in the First Timers section, or you can review some of the patches from other people. Reviewing documentation patches is one of the best ways to learn what’s in the guides.

Day 2:

Reviewing documentation can be confusing – people are replying to the patch with requests, bug reports and maybe even content specifications.

At the beginning of each release cycle, the project teams work out their key deliverables at the Project Team Gathering (PTG). This immediately impacts the documentation – what changes upstream must change in documentation. This usually comes to the documentation in the form of a bug report. The project will report a bug to the OpenStack manuals team by either tagging DocImpact in the commit message of the original development patch or by filing an entirely new bug with a request for documentation to be updated.

While the project teams work out their key deliverables, the documentation team also has a chance to decide on what deliverables need to be met within the guides. This might relate to technical debt, archiving, or perhaps even a mass change that must occur across all of the guides. This work is tracked through a specification.

All patches up for review should link to a specification, bug report, or, at the very least, have a detailed commit message following our Good Practice guidelines.

When reviewing the patch, ensure that the committer has explained why they fixed the problem and ensure that what they say matches the output. If you need to build the documentation to review properly, you can use our build tools, or you can use the gate jobs, gate-[REPO-NAME]-tox-doc-publish-checkbuild, to check the build in your browser.

Here are some guidelines to remember when reviewing someone else’s patch: http://docs.openstack.org/contributor-guide/docs-review.html

Day 3:

On Day 1, you pushed up your first patch. You made iterations based off requests from other individuals and now, according to the guidelines, your patch can merge with the required +2, +1 and +2a. Do not be concerned if your patch is not ready and merged by Day  3, however. Getting a patch reviewed and then merged can often take time.

Safely merged, you need to know where to go next. If you would like to work on a specific guide or guides and you don’t know how to get involved, see Day 4.

If you are interested in staying involved but really don’t know what you want to do, we recommend you continue fixing bugs. You can find the list of all bugs that have been confirmed or triaged by the OpenStack manuals team here.

Do not work on any bugs that are labeled “New” and do not have “Confirmed” or “Triaged” in the Status column or any bugs that already have an assignee.

Day 4:

One of the things you might come across on the mailing list and in the Contributor Guide, is the mention of specialty teams. To ensure that each of our guides are looked after and the bugs against the guides are dealt with, the documentation team has assigned specialty team leads. You can find the list of each specialty team lead here.

To get more involved in an individual guide, contact the relevant individual listed. Each team often has projects happening that require new contributors. You do not have to specialize in only one guide.

Day 5:

Now that you’ve spent your first week working within the manuals, you have several possible routes to take from here. You can:

  1. Continue working with the documentation team and gain insight into how OpenStack is installed, used, and administered by fixing bugs and working with the specialty teams.
  2. Find more documentation outlets. Each development project has their own developer-tailored documentation. You can find that, and more information, at: docs.openstack.org/develop/PROJECTNAME
  3. Start working on your project of interest! All you need to do is clone the relevant repository and get started!

Good luck!

Join us in #openstack-doc on Freenode to say “hi” and have a chat!

If you choose to contribute to another project, please always come back and document the new changes so that the code can be used by admins.

Cover Photo // CC BY NC

The post From zero to hero: Your first week as an OpenStack contributor appeared first on OpenStack Superuser.

by Alexandra Settle at February 10, 2017 01:00 PM

OpenStack in Production

os_type property for Windows images on KVM


The OpenStack images have a long list of properties which can set to describe the image meta data. The full list is described in the documentation. This blog reviews some of these settings for Windows guests running on KVM, in particular for Windows 7 and Windows 2008R2.

At CERN, we've used a number of these properties to help users filter images such as the OS distribution and version but also added some additional properties for specific purposes such as

  • when the image was released (so the images can be sorted by date)
  • whether the image is the latest recommended one (such as setting the CentOS 7.2 image to not recommended when CentOS 7.3 comes out)
  • which CERN support team provided the image 

For a typical Windows image, we have

$ glance image-show 9e194003-4608-4fe3-b073-00bd2a774a57
+-------------------+----------------------------------------------------------------+
| Property          | Value                                                          |
+-------------------+----------------------------------------------------------------+
| architecture      | x86_64                                                         |
| checksum          | 27f9cf3e1c7342671a7a0978f5ff288d                               |
| container_format  | bare                                                           |
| created_at        | 2017-01-27T16:08:46Z                                           |
| direct_url        | rbd://b4f463a0-c671-43a8-bd36-e40ab8d233d2/images/9e194003-4   |
| disk_format       | raw                                                            |
| hypervisor_type   | qemu                                                           |
| id                | 9e194003-4608-4fe3-b073-00bd2a774a57                           |
| min_disk          | 40                                                             |
| min_ram           | 0                                                              |
| name              | Windows 10 - With Apps [2017-01-27]                            |
| os                | WINDOWS                                                        |
| os_distro         | Windows                                                        |
| os_distro_major   | w10entx64                                                      |
| os_edition        | DESKTOP                                                        |
| os_version        | UNKNOWN                                                        |
| owner             | 7380e730-d36c-44dc-aa87-a2522ac5345d                           |
| protected         | False                                                          |
| recommended       | true                                                           |
| release_date      | 2017-01-27                                                     |
| size              | 37580963840                                                    |
| status            | active                                                         |
| tags              | []                                                             |
| updated_at        | 2017-01-30T13:56:48Z                                           |
| upstream_provider | https://cern.service-now.com/service-portal/function.do?name   |
| virtual_size      | None                                                           |
| visibility        | public                                                         |
+-------------------+----------------------------------------------------------------+

Recently, we have seen some cases of Windows guests becoming unavailable with the BSOD error "CLOCK_WATCHDOG_TIMEOUT (101)".  On further investigation, these tended to occur around times of heavy load on the hypervisors such as another guest doing CPU intensive work.

Windows 7 and Windows Server 2008 R2 were the guest OSes where these problems were observed. Later OS levels did not seem to show the same problem.

We followed the standard processes to make sure the drivers were all updated but the problem still occurred.

Looking into the root cause, the Red Hat support articles were a significant help.

"In the environment described above, it is possible that 'CLOCK_WATCHDOG_TIMEOUT (101)' BSOD errors could be due to high load within the guest itself. With virtual guests, tasks may take more time that expected on a physical host. If Windows guests are aware that they are running on top of a Microsoft Hyper-V host, additional measures are taken to ensure that the guest takes this into account, reducing the likelihood of the guest producing a BSOD due to time-outs being triggered."

These suggested to use the os_type parameter to help inform the hypervisor to use some additional flags. However, the OpenStack documentation explained this was a XenAPI only setting (which would not therefore apply for KVM hypervisors).

It is not always clear which parameters to set for an OpenStack image. Setting os_distro has a value such as 'windows' or 'ubuntu'. While the flavor of the OS could be determined, the setting of os_type is needed to be used by the code.

Thus, in order to get the best behaviour for Windows guests, from our experience, we would recommend setting both the os_distro and os_type as follows.
  • os_distro = 'windows'
  • os_type = 'windows'
When the os_type parameter is set, some additional XML is added to the KVM configuration following the Kilo enhancement.

<features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
  </features>
  ....
  <clock offset='localtime'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>

These changes have led to an improvement when running on a loaded hypervisors, especially for Windows 7 and 2008R2 guests. A bug has been opened for the documentation to explain the setting is not Xen only.

Acknowledgements

  • Jose Castro Leon performed all of the analysis and testing of the various solutions.

References



<style></style>

by Tim Bell (noreply@blogger.com) at February 10, 2017 07:29 AM

February 09, 2017

Lance Bragstad

Using OpenStack-Ansible to performance test rolling upgrades

I’ve spent the last year or so dabbling with ways to provide consistent performance results for keystone. In addition to that, the keystone community has been trying to implement rolling upgrades support. Getting both of these tested and in the gate would be a huge step forward for developers and deployers.

Today I hopped into IRC and Jesse from the OpenStack-Ansible team passed me a review that used OpenStack-Ansible to performance test keystone during a rolling upgrade… Since that pretty much qualifies as one of the coolest reviews someone has ever handed me, I couldn’t wait to test it out. I was able to do everything on a fresh Ubuntu 16.04 virtual machine with 8 GB of memory, 8 VCPUs and I brought it up to speed following the initial steps provided in the OpenStack-Ansible AIO Guide.

<script src="https://gist.github.com/lbragstad/7e83a983fd83f63074feb600910de2f8.js?file=basic-commands.sh"></script>

Next I made sure I had pip and tox available, as well as tmux for my own personal preference. Luckily the OpenStack-Ansible team does a good job of managing binary dependencies in tree, which makes getting fresh installs up and off the ground virtually headache-free. Since the patch was still in review at the time of this writing, I went ahead and checked that out of Gerrit.

<script src="https://gist.github.com/lbragstad/7e83a983fd83f63074feb600910de2f8.js?file=checkout-change.sh"></script>

From here, the os_keystone role should be able to setup the infrastructure and environment. Another nice thing about the various roles in OpenStack-Ansible is that they isolate tox environments much like you would for building docs, syntax linting, or running tests using a specific version of python. In this case, there happens to be one dedicated to upgrades. Behind the scenes this is going to prepare the infrastructure, install lxc, orchestrate multiple installations of the most recent stable keystone release isolated into separate containers (which plays a crucial role in achieving rolling upgrades), install the latest keystone source code from master, and perform a rolling upgrade (whew!). Lucky for us, we only have to run one command.

<script src="https://gist.github.com/lbragstad/7e83a983fd83f63074feb600910de2f8.js?file=tox.sh"></script>

The first time I ran tox locally I did get one failure related to the absence of libpq-dev while installing requirements for os_tempest:

<script src="https://gist.github.com/lbragstad/7e83a983fd83f63074feb600910de2f8.js?file=libpq-dev-failure.output"></script>

Other folks were seeing the same thing, but only locally. For some reason the gate was not hitting this specific issue (maybe it was using wheels?). There is a patch up for review to fix this. After that I reran tox and was rewarded with:

<script src="https://gist.github.com/lbragstad/7e83a983fd83f63074feb600910de2f8.js?file=success.output"></script>

Not only do we see that the rolling upgrade succeeded according to os_keystone‘s functional tests, but we also see the output from the performance tests. There were 2527 total requests during the execution of the upgrade, 10 of which resulted in an error (could probably use some tweaking here to see if node rotation timing using HAProxy mitigates those?).

Next Steps

Propose a rolling upgrade keystone gate job

Now that we have a consistent way to test rolling upgrades while running a performance script, we can start looping this into other gate jobs. It would be awesome to be able to leverage this work to test every patch proposed to ensure it is not only performant, but also maintains our commitment to delivering rolling upgrades.

Build out the performance script

The performance script is just python that gets fed into Locust. The current version is really simple and only focuses on authenticating for a token and validating it. Locust has some flexibility that allows writers to add new test cases and even assign different call percentages to different operations (i.e. authenticate for a token 30% of the time and validate 70% of the time). Since it’s all python making API calls, Locust test cases are really just functional API tests. This makes it easy to propose patches that add more scenarios as we move forward, increasing our rolling upgrade test coverage. From the output we should be able to inspect which calls failed, just like today when we saw we had 10 authentication/validation failures.

Publish performance results

With running this as part of the gate, it would be a waste to not stash or archive the results from each run (especially if two separate projects are running it). We could even look into running it on dedicated hardware somewhere, similar to the performance testing project I was experimenting with last year. The OSIC Performance Bot would technically be a first-class citizen gate job (and we could retire the first iteration of it!). All the results could be stuffed away somewhere and made available for people to write tools that analyze it. I’d personally like to revamp our keystone performance site to continuously update according to the performance results from the latest master patch. Maybe we could even work some sort of performance view into OpenStack Health.

The final bit that helps seal the deal is that we get this at the expense of a single virtual machine. Since OpenStack-Ansible uses containers to isolate services we can feel confident in testing rolling upgrades while only consuming minimal gate resources.

I’m look forward to doing a follow up post as we hopefully start incorporating this into our gate.

by lbragstad at February 09, 2017 10:45 PM

Rich Bowen

Project Leader

I was recently asked to write something about the project that I work on – RDO – and one of the questions that was asked was:

A healthy project has a visible lead(s). Who is the project lead(s) for this project?

This struck me as a strange question because, for the most part, the open source projects that I choose to work on don’t have a project lead, but are, rather, led by community consensus, as well as a healthy dose of “Just Do It”. This is also the case with RDO, where decisions are discussed in public on the mailing list, and on IRC meetings, and those that step up to do the work have more practical influence than those that just talk about it.

Now, this isn’t to say that nobody takes leadership or ownership of the projects. In many senses, everyone does. But, of course, certain people do rise to prominence from time to time, just based on the volume of work that they do, and these people are the de facto leaders for that moment.

There’s a lot of different leadership styles in open source, and a lot of projects do in fact choose to have one technical leader who has the final say on all contributions. That model can work well, and does in many cases. But I think it’s important for a project to ask itself a few questions:

  • What do we do when a significant number of the community disagrees with the direction that this leader is taking things?
  • What happens when the leader leaves? This can happen for many different reasons, from vacation time, to losing interest in the project, to death.
  • What do we do when the project grows in scope to the point that a single leader can no longer be an expert on everything?

A strong leader who cares about their project and community will be able to delegate, and designate replacements, to address these concerns. A leader who is more concerned with power or ego than with the needs of their community is likely to fail on one or more of these tests.

But, I find that I greatly prefer projects where project governance is of the people, by the people, and for the people.

by rbowen at February 09, 2017 10:01 PM

Maish Saidel-Keesing

I am Running for the OpenStack User Committee

Two days ago I decided to submit my candidacy for one of the two spots up for election (for the first time!) on the OpenStack User committee.

I am pasting my proposal verbatim (original email link here)…

Good evening to you all.

As others have so kindly stepped up - I would also like to self-nominate myself for as candidate for the User committee.

I have been involved in the OpenStack community since the Icehouse release.

From day 1,  I felt that the user community was not completely accepted as a part of the OpenStack community and that there was a clear and broad disconnect between the two parts of OpenStack.

Instead of going all the way back - and stepping through time to explain who I am and what I have done - I have chosen a few significant points along the way - of where I think I made an impact - sometimes small - but also sometimes a lot bigger.

  • The OpenStack Architecture Design Guide [1]. This was my first Opensource project and it was an honor to participate and help the community to produce such a valuable resource.
  • Running for the TC for the first time [2]. I was not elected.
  • Running for the TC for the second time [3]. Again I was not elected.

    (There has never been a member of the User community elected to a TC seat - AFAIK)

In my original candidacy [2] proposal - I mentioned the inclusion of others.

Which is why I so proud of the achievement of the definition of the AUC from the last cycle and the workgroup [3] that Shamail Tahir and I co-chaired
(Needless to say that a **huge** amount of the credit goes also to all the other members of the WG that were involved!!) in making this happen.

Over the years I think I have tried to make difference (perhaps not always in the right way) - maybe the developer community was not ready for such a drastic change - and I still think that they are not.

Now is a time for change.

I think that the User Committee and these upcoming election (which are the first ever) are a critical time for all of us that are part of the OpenStack community - who contribute in numerous ways - **but do not contribute code**.

The User Committee is now becoming what it should have been from the start, an equal participant in the 3 pillars of OpenStack.

I would like to be a part, actually I would be honored to be a part, of ensuring that this comes to fruition and would like to request your vote for the User Committee.

Now down to the nitty gritty. If elected I would like to focus on the following (but not only):

  1. Establishing the User committee as significant part of OpenStack - and continue the amazing collaboration that has been forged over the past two years. The tangible feedback to the OpenStack community provided by the Working Groups have defined clear requirements coming from the trenches and need to be addressed throughout the community as a whole.
  2. Expand the AUC constituency - both by adding additional criteria and by encouraging more participation in the community according to the initial defined criteria.
  3. Establish a clear and fruitful working relationship with Technical committee - enabling the whole of OpenStack to continue to evolve, produce features and functionality that is not only cutting edge but also fundamental and crucial to anyone and everyone using OpenStack today.

Last but not least - I would like to point you to a blog post I wrote almost a year ago [5].

My views have not changed. OpenStack is evolving and needs participation not only from the developer community (which by the way is facing more than enough of its own challenges) but also from us who use, and operate OpenStack.

For me - we are already in a better place - and things will only get better - regardless of who leads the User committee.

Thank you for your consideration - and I would like to wish the best of luck to all the other candidates.

--
Best Regards,
Maish Saidel-Keesing

[1] http://technodrone.blogspot.com/2014/08/the-openstack-architecture-design-guide.html

[2] http://lists.openstack.org/pipermail/openstack-dev/2015-April/062372.html

[3] http://lists.openstack.org/pipermail/openstack-dev/2015-September/075773.html

[4] https://wiki.openstack.org/wiki/AUCRecognition

[5] http://technodrone.blogspot.com/2016/03/we-are-all-openstack-are-we-really.html

Elections open up on February 13th and only those who have been recognized as AUC (Active User Contributors) are eligible to vote.

Don’t forget to vote!

by Maish Saidel-Keesing (noreply@blogger.com) at February 09, 2017 09:41 PM

NFVPE @ Red Hat

Let’s (manually) run k8s on CentOS!

So sometimes it’s handy to have a plain-old-Kubernetes running on CentOS 7. Either for development purposes, or to check out something new. Our goal today is to install Kubernetes by hand on a small cluster of 3 CentOS 7 boxen. We’ll spin up some libvirt VMs running CentOS generic cloud images, get Kubernetes spun up on those, and then we’ll run a test pod to prove it works. Also, this gives you some exposure to some of the components that are running ‘under the hood’.

by Doug Smith at February 09, 2017 08:10 PM

Graham Hayes

OpenStack Designate - Where we are.

<figure></figure>

I have been asked a few times recently "What is the state of the Designate project?", "How is Designate getting on?", and by people who know what is happening "What are you going to do about Designate?".

Needless to say, all of this is depressing to me, and the people that I have worked with for the last number of years to make Designate a truly useful, feature rich project.

Note

TL;DR; for this - Designate is not in a sustainable place.

To start out - Designate has always been a small project. DNS does not have massive cool appeal - its not shiny, pretty, or something you see on the front page of HackerNews (unless it breaks - then oh boy do people become DNS experts).

A line a previous PTL for the project used to use, and I have happily robbed is "DNS is like plumbing, no one cares about it until it breaks, and then you are standing knee deep in $expletive". (As an aside, that was the reason we chose the crocodile as our mascot - its basically a dinosaur, old as dirt, and when it bites it causes some serious complications).

Unfortunately that comes over into the development of DNS products sometimes. DNSaaS is a check box on a tender response, an assumption.

We were lucky in the beginning - we had 2 large(ish) public clouds that needed DNS services, and nothing currently existed in the eco-system, so we got funding for a team from a few sources.

We got a ton done in that period - we moved from a v1 API which was synchronous to a new v2 async API, we massively increased the amount of DNS servers we supported, and added new features.

Unfortunately, this didn't last. Internal priorities within companies sponsoring the development changed, and we started to shed contributors, which happens, however disappointing. Usually when this happens if a project is important enough the community will pick up where the previous group left off.

We have yet to see many (meaningful) commits from the community though. We have some great deployers who will file bugs, and if they can put up patch sets - but they are (incredibly valuable and appreciated) tactical contributions. A project cannot survive on them, and we are no exception.

So where does that leave us? Let have a look at how many actual commits we have had:

Commits per cycle
Havana 172
Icehouse 165
Juno 254
Kilo 340
Liberty 327
Mitaka 246
Newton 299
Ocata 98

Next cycle, we are going to have 2 community goals:

  • Control Plane API endpoints deployment via WSGI
  • Python 3.5 functional testing

We would have been actually OK for the tempest one - we were one of the first external repo based plug-ins with designate-tempest-plugin

For WSGI based APIs, this will be a chunk of work - due to our internal code structure splitting out the API is going to be ... an issue. (and I think it will be harder than most people expect - anyone using olso.service has eventlet imported - I am not sure how that affects running in a WSGI server)

Python 3.5 - I have no idea. We can't even run all our unit tests on python 3.5, so I suspect getting functional testing may be an issue. And, convincing management that re-factoring parts of the code base due to "community goals" or a future potential pay-off can be more difficult than it should.

/images/oct-2016-projects-prod.jpg

We now have a situation where the largest "non-core" project [1] in the tent has a tiny number of developers working on it. 42% of deployers are evaluating Designate, so we should see this start to increase.

How did this happen?

Like most situations, there is no single cause.

Certainly there may have been fault on the side of the Designate leadership. We had started out as a small team, and had built a huge amount of trust and respect based on in person interactions over a few years, which meant that there was a fair bit of "tribal knowledge" in the heads of a few people, and that new people had a hard time becoming part of the group.

Also, due to volume of work done by this small group, a lot of users / distros were OK leaving us work - some of us were also running a production designate service during this time, so we knew what we needed to develop, and we had pretty quick feedback when we made a mistake, or caused a bug. All of this resulted in the major development cost being funded by two companies, which left us vulnerable to changes in direction from those companies. Then that shoe dropped. We are now one corporate change of direction from having no cores on the project being paid to work on the project. [2]

Preceding this, the governance of OpenStack changed to the Big Tent While this change was a good thing for the OpenStack project as a whole it had quite a bad impact on us.

Pre Big Tent, you got integrated. This was at least a cycle, where you moved docs to docs.openstack.org, integrated with QA testing tooling, got packaged by Linux distros, and build cross project features.

When this was a selective thing, there was teams available to help with that, docs teams would help with content (and tooling - docs was a mass of XML back then), QA would help with tempest and devstack, horizon would help with panels.

In Big Tent, there just wasn't resources to do this - the scope of the project expansion was huge. However the big tent happened (in my opinion - I have written about this before) before the horizontal / cross project teams were ready. They stuck to covering the "integrated" projects, which was all they could do at the time.

This left us in a position of having to reimplement tooling, figure out what tooling we did have access to, and migrate everything we had on our own. And, as a project that (at our peak level of contribution) only ever had 5% of the number of contributors compared to a project like nova, this put quite a load on our developers. Things like grenade, tempest and horizon plug-ins, took weeks to figure out all of which took time from other vital things like docs, functional tests and getting designate into other tools.

One of the companies who invested in designate had a QE engineer that used to contribute, and I can honestly say that the quality of our testing improved 10 fold during the time he worked with us. Not just from in repo tests, but from standing up full deployment stacks, and trying to break them - we learned a lot about how we could improve things from his expertise.

Which is kind of the point I think. Nobody is amazing at everything. You need people with domain knowledge to work on these areas. If you asked me to do a multi-node grenade job, I would either start drinking, throw my laptop at you or do both.

We still have some of these problems to this day - most of our docs are in a messy pile in docs.openstack.org/developer/designate while we still have a small amount of old functional tests that are not ported from our old non plug-in style.

All of this adds up to make projects like Designate much less attractive to users - we just need to look at the project navigator to see what a bad image potential users get of us. [3] This is for a project that was ran as a full (non beta) service in a public cloud. [4]

Where too now then?

Well, this is where I call out to people who actually use the project - don't jump ship and use something else because of the picture I have painted. We are a dedicated team, who cares about the project. We just need some help.

I know there are large telcos who use Designate. I am sure there is tooling, or docs build up in these companies that could be very useful to the project.

Nearly every commercial OpenStack distro has Designate. Some have had it since the beginning. Again, developers, docs, tooling, testers, anything and everything is welcome. We don't need a massive amount of resources - we are a small ish, stable, project.

We need developers with upstream time allocated, and the budget to go to events like the PTG - for cross project work, and internal designate road map, these events form the core of how we work.

We also need help from cross project teams - the work done by them is brilliant but it can be hard for smaller projects to consume. We have had a lot of progress since the Leveller Playing Field debate, but a lot of work is still optimised for the larger teams who get direct support, or well resourced teams who can dedicate people to the implementation of plugins / code.

As someone I was talking to recently said - AWS is not winning public cloud because of commodity compute (that does help - a lot), but because of the added services that make using the cloud, well, cloud like. OpenStack needs to decide that either it is just compute, or if it wants the eco-system. [5] Designate is far from alone in this.

I am happy to talk to anyone about helping to fill in the needed resources - Designate is a project that started in the very office I am writing this blog post in, and something I want to last.

For a visual this is Designate team in Atlanta, just before we got incubated.

/images/ATL.jpg

and this was our last mid cycle:

/images/mid-cycle.jpg

and in Atlanta at the PTG, there will be two of us.

[1]In the Oct-2016 User Survey Designate was deployed in 23% of clouds
[2]I have been lucky to have a management chain that is OK with me spending some time on Designate, and have not asked me to take time off for Summits or Gatherings, but my day job is working on a completely different project.
[3]I do have other issues with the metrics - mainly that we existed before leaving stackforge, and some of the other stats are set so high, that non "core" projects will probably never meet them.
[4]I recently went to an internal training talk, where they were talking about new features in Newton. There was a whole slide about how projects had improved, or gotten worse on these scores. A whole slide. With tables of scores, and I think there may have even been a graph.
[5]Now, I am slightly biased, but I would argue that DNS is needed in commodity compute, but again, that is my view.

by Graham Hayes at February 09, 2017 06:38 PM

OpenStack Superuser

CERN’S expanding cloud universe

CERN is rapidly expanding OpenStack cores in production as it accelerates work on understanding the mysteries of the universe.

The European Organization for Nuclear Research currently has over 190,000 cores in production and plans to add another 100,000 in the next six months, says Spyros Trigazis, adding that about 90 percent of CERN’s compute resources are now delivered on OpenStack.

Trigazis, who works on the compute management and provisioning team, offered a snapshot of all things cloud at CERN in a presentation at the recent CentOS Dojo in Brussels.

RDO’s Rich Bowen shot the video, which runs through CERN’s three-and-a-half years of OpenStack in production as well as what’s next for the humans in the CERN loop — the OpenStack team, procurement and software management and LinuxSoft, Ceph and DBoD teams.

Trigazis also outlined the container infrastructure, which uses OpenStack Magnum to treat container orchestration engines (COEs) as first-class resources. Since Q4 2016, CERN has been in production with Magnum providing support for Docker Swarm, Kubernetes and Mesos a well as storage drivers for (CERN-specific) EOS and CernVM File System (CVMFs). Trigazis says that many users are interested in containers and usage has been ramping up around GitLab continuous integration, Jupyter/Swan and FTS. CERN is currently using the Newton release, with “cherry-picks,” he adds.

<script async="async" charset="utf-8" src="http://platform.twitter.com/widgets.js"></script>

Upcoming services include baremetal with Ironic; the API server and conductor are already deployed and the first node is to come this month. Another is workflow service Mistral used to simplify operations, create users and clean up resources. It’s already deployed and right now the team is testing prototype workflows. FileShare service Manila, which has been in pilot mode since Q4 of 2016, will be used to share configuration and certificates.

You can catch the entire 19-minute presentation on YouTube or more videos from CentOS Dojo on the RDO blog.

For updates from the CERN cloud team, check out the OpenStack in Production blog.

 

Cover Photo // CC BY NC

The post CERN’S expanding cloud universe appeared first on OpenStack Superuser.

by Nicole Martinelli at February 09, 2017 01:09 PM

February 08, 2017

Daniel P. Berrangé

Commenting out XML snippets in libvirt guest config by stashing it as metadata

Libvirt uses XML as the format for configuring objects it manages, including virtual machines. Sometimes when debugging / developing it is desirable to comment out sections of the virtual machine configuration to test some idea. For example, one might want to temporarily remove a secondary disk. It is not always desirable to just delete the configuration entirely, as it may need to be re-added immediately after. XML has support for comments <!-- .... some text --> which one might try to use to achieve this. Using comments in XML fed into libvirt, however, will result in an unwelcome suprise – the commented out text is thrown into /dev/null by libvirt.

This is an unfortunate consequence of the way libvirt handles XML documents. It does not consider the XML document to be the master representation of an object’s configuration – a series of C structs are the actual internal representation. XML is simply a data interchange format for serializing structs into a text format that can be interchanged with the management application, or persisted on disk. So when receiving an XML document libvirt will parse it, extracting the pieces of information it cares about which are they stored in memory in some structs, while the XML document is discarded (along with the comments it contained). Given this way of working, to preserve comments would require libvirt to add 100’s of extra fields to its internal structs and extract comments from every part of the XML document that might conceivably contain them. This is totally impractical to do in realityg. The alternative would be to consider the parsed XML DOM as the canonical internal representation of the config. This is what the libvirt-gconfig library in fact does, but it means you can no longer just do simple field accesses to access information – getter/setter methods would have to be used, which quickly becomes tedious in C. It would also involve re-refactoring almost the entire libvirt codebase so such a change in approach would realistically never be done.

Given that it is not possible to use XML comments in libvirt, what other options might be available ?

Many years ago libvirt added the ability to store arbitrary user defined metadata in domain XML documents. The caveat is that they have to be located in a specific place in the XML document as a child of the <metadata> tag, in a private XML namespace. This metadata facility to be used as a hack to temporarily stash some XML out of the way. Consider a guest which contains a disk to be “commented out”:

<domain type="kvm">
  ...
  <devices>
    ...
    <disk type='file' device='disk'>
    <driver name='qemu' type='raw'/>
    <source file='/home/berrange/VirtualMachines/demo.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    ...
  </devices>
</domain>

To stash the disk config as a piece of metadata requires changing the XML to

<domain type="kvm">
  ...
  <metadata>
    <s:disk xmlns:s="http://stashed.org/1" type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/home/berrange/VirtualMachines/demo.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </s:disk>
  </metadata>
  ...
  <devices>
    ...
  </devices>
</domain>

What we have done here is

– Added a <metadata> element at the top level
– Moved the <disk> element to be a child of <metadata> instead of a child of <devices>
– Added an XML namespace to <disk> by giving it an ‘s:’ prefix and associating a URI with this prefix

Libvirt only allows a single top level metadata element per namespace, so if there are multiple tihngs to be stashed, just give them each a custom namespace, or introduce an arbitrary wrapper. Aside from mandating the use of a unique namespace, libvirt treats the metadata as entirely opaque and will not try to intepret or parse it in any way. Any valid XML construct can be stashed in the metadata, even invalid XML constructs, provided they are hidden inside a CDATA block. For example, if you’re using virsh edit to make some changes interactively and want to get out before finishing them, just stash the changed in a CDATA section, avoiding the need to worry about correctly closing the elements.

<domain type="kvm">
  ...
  <metadata>
    <s:stash xmlns:s="http://stashed.org/1">
    <![CDATA[
      <disk type='file' device='disk'>
        <driver name='qemu' type='raw'/>
        <source file='/home/berrange/VirtualMachines/demo.qcow2'/>
        <target dev='vda' bus='virtio'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
      </disk>
      <disk>
        <driver name='qemu' type='raw'/>
        ...i'll finish writing this later...
    ]]>
    </s:stash>
  </metadata>
  ...
  <devices>
    ...
  </devices>
</domain>

Admittedly this is a somewhat cumbersome solution. In most cases it is probably simpler to just save the snippet of XML in a plain text file outside libvirt. This metadata trick, however, might just come in handy some times.

As an aside the real, intended, usage of the <metdata> facility is to allow applications which interact with libvirt to store custom data they may wish to associated with the guest. As an example, the recently announced libvirt websockets console proxy uses it to record which consoles are to be exported. I know of few other real world applications using this metadata feature, however, so it is worth remembering it exists :-) System administrators are free to use it for local book keeping purposes too.

by Daniel Berrange at February 08, 2017 07:14 PM

OpenStack Superuser

User Committee Elections are live

OpenStack has been a vast success and continues to grow. Additional ecosystem partners are enhancing support for OpenStack and it has become more and more vital that the communities developing services around OpenStack lead and influence the products movement.

The OpenStack User Committee helps increase operator involvement, collects feedback from the community, works with user groups around the globe, and parses through user survey data, to name a few. Users are critical, and the User Committee aims to represent the user. With all the growth we are seeing with OpenStack, we are looking to expand the User Committee and have kicked off an election.

We are looking to elect two (2) User Committee members for this election. These User Committee seats will be valid for a one-year term. For this election, the Active User Contributors (AUC) community will review the candidates and vote.

So what makes an awesome candidate for the User Committee?

Well, to start, the nominee has to be an individual member of the OpenStack Foundation who is an Active User Contributor (AUC).  Additionally, below are a few things that will make you stand out:

·      If you are an OpenStack end-user and/or operator

·      An OpenStack contributor from the User Committee working groups

·      Actively engaged in the OpenStack community

·      Organizer of an OpenStack local User Group meetup

Beyond the kinds of community activities you are already engaged in, the User Committee role adds some additional work. The User Committee usually interacts on e-mail to discuss any pending topics. Prior to each Summit, we spend a few hours going through the User Survey results and analyzing the data.

You can nominate yourself or someone else by sending an email to the user-committee@lists.openstack.org mailing-list, with the subject: “UC candidacy” by Friday, February 10, 05:59 UTC.

The email should include a description of the candidate and what the candidate hopes to accomplish.

We look forward to receiving your submissions!

The post User Committee Elections are live appeared first on OpenStack Superuser.

by Superuser at February 08, 2017 12:02 PM

Bernard Cafarelli

Tracking Service Function Chaining with Skydive

Skydive is “an open source real-time network topology and protocols analyzer”. It is a tool (with CLI and web interface) to help analyze and debug your network (OpenStack, OpenShift, containers, …). Dropped packets somewhere? MTU issues? Routing problems? These are some issues where running skydive whill help.

So as an update on my previous demo post (this time based on the Newton release), let’s see how we can trace SFC  with this analyzer!

devstack installation

Not a lot of changes here, check out devstack on the stable/newton branch, grab the local.conf file I prepared (configure to use skydive 0.9 release) and run “./stack.sh”!

For the curious, the SFC/Skydive specific parts are:
# SFC
enable_plugin networking-sfc https://git.openstack.org/openstack/networking-sfc stable/newton

# Skydive
enable_plugin skydive https://github.com/skydive-project/skydive.git refs/tags/v0.9.0
enable_service skydive-agent skydive-analyzer

Skydive web interface and demo instances

Before running the script to configure the SFC demo instances, open the skydive web interface (it listens on port 8082, check your instance firewall if you cannot connect):

http://${your_devstack_ip}:8082

The login was configured with devstack, so if you did not change, use admin/pass123456.
Then add the demo instances as in the previous demo:
$ git clone https://github.com/voyageur/openstack-scripts.git -b sfc_newton_demo
$ ./openstack-scripts/simple_sfc_vms.sh

And watch as your cloud goes from “empty” to “more crowded”:

Skydive CLI, start traffic capture

Now let’s enable traffic capture on the integration bridge (br-int), and all tap interfaces (more details on the skydive CLI available in the documentation):
$ export SKYDIVE_USERNAME=admin
$ export SKYDIVE_PASSWORD=pass123456
$ /opt/stack/go/bin/skydive --conf /tmp/skydive.yaml client capture create --gremlin "G.V().Has('Name', 'br-int', 'Type', 'ovsbridge')"
$ /opt/stack/go/bin/skydive --conf /tmp/skydive.yaml client capture create --gremlin "G.V().Has('Name', Regex('^tap.*'))"

Note this can be done in the web interface too, but I wanted to show both interfaces.

Track a HTTP request diverted by SFC

Make a HTTP request from the source VM to the destination VM (see previous post for details). We will highlight the nodes where this request has been captured: in the GUI, click on the capture create button, select “Gremlin expression”, and use the query:
G.Flows().Has('Network','10.0.0.18','Transport','80').Nodes()

This expression reads as “on all captured flows matching IP address 10.0.0.18 and port 80, show nodes”. With the CLI you would get a nice JSON output of these nodes, here in the GUI these nodes will turn yellow:

If you look at our tap interface nodes, you will see that two are not highlighted. If you check their IDs, you will find that they belong to the same service VM, the one in group 1 that did not get the traffic.

If you want to single out a request, in the skydive GUI, select one node where capture is active (for example br-int). In the flows table, select the request, scroll down to get its layer 3 tracking ID “L3TrackingID” and use it as Gremlin expression:
G.Flows().Has('L3TrackingID','5a7e4bd292e0ba60385a9cafb22cf37d744a6b46').Nodes()

Going further

Now it’s your time to experiment! Modify the port chain, send a new HTTP request, get its L3TrackingID, and see its new path. I find the latest ID quickly with this CLI command (we will see how the skydive experts will react to this):
$ /opt/stack/go/bin/skydive --conf /tmp/skydive.yaml client topology query --gremlin "G.Flows().Has('Network','10.0.0.18','Transport','80').Limit(1)" | jq ".[0].L3TrackingID"

You can also check each flow in turn, following the paths from a VM to another one, go further with SFC, or learn more about skydive:

by Bernard Cafarelli at February 08, 2017 11:43 AM

NFVPE @ Red Hat

Automated OSP deployments with Tripleo Quickstart

In this article I’m going to show a method for automating OSP (RedHat OpenStack platform) deployments. These automated deployments can be very useful for CI, or simply to experiment and test with the system. Components involved ansible-cira: set of playbooks to deploy Jenkins, jenkins-job-builder and an optional ELK stack. This will install a ready to use system with all the preconfigured jobs (including OSP10 deployments and image building). ansible-cira jenkins-jobs: A set of job templates and macros, using jenkins-job-builder syntax, that get converted into Jenkins jobs for building the OSP base images and for deploying the system. ansible-cira job-configs: A…

by Yolanda Robla Mota at February 08, 2017 10:42 AM

February 07, 2017

Cloudwatt

Instance backup script

This script is designed to allow you to schedule backups of your Nova instances.

You can set a policy to retain your backups.

Since the script is written in python, it can be run from any machine on which python is installed.

You do not need to install OpenStack clients to run this script.

Retrieving the script from the Git repository:

$ git clone https://github.com/myorg92/os-nova-backup.git
$ cd os-nova-backup/

Running the script requires that some environment variables be loaded:

OS_USERNAME : The user name of your OpenStack account On Cloudwatt = email address

OS_PASSWORD : The password of your OpenStack account

OS_TENANT_ID : Identifiant of the Openstack tenant

OS_AUTH_URL : The URL of the identification service On Cloudwatt = https://identity.fr2.cloudwatt.com/v2.0

OS_COMPUTE_URL : The URL of the Compute service On Cloudwatt = https://compute.fr1.cloudwatt.com/v2/ « tenant-id » for fr1 https://compute.fr2.cloudwatt.com/v2/« tenant-id » for fr2

utilisation: python backup.py <server_id> <server_name> <weekly|daily> <rotation>

Positional arguments:

< server_id > ID of the instance to backup

< server_name > Name of tje backup image.

< weekly|daily > Type of backup : “daily” or “weekly”.

<rotation> Parameter Int representing the number of backups to keep

You can schedule backups via crontab or Jenkins for example. Here are 2 examples of cron task:

Weekly backup with a retention of 4 weeks

0 2 * * 6 python backup.py 0a49912c-1661-4d92-b469-53dfea7ce3da Myinstance weekly 4

Daily backup with 1 week retention

0 2 1-5,7 * *  python backup.py 0a49912c-1661-4d92-b469-53dfea7ce3da Myinstance daily 6

Backups will be named as “instance name” - “year / day / hour / minute” - “type of backup” Example: myinstance-20171281425-weekly

This is actually a snapshot that will be stored as an image in your tenant. This image can be used to launch new instances or to restore an existing instance.

This version of the script does not allow yet the restoration (rebuild). It is therefore necessary to use the CLI:

$ nova rebuild 0a49912c-1661-4d92-b469-53dfea7ce3da myinstance-20171281425-weekly

by Kamel Yamani at February 07, 2017 11:00 PM

OpenStack in Production

Tuning hypervisors for High Throughput Computing

Over the past set of blogs, we've looked at a number of different options for tuning High Energy Physics workloads in a KVM environment such as the CERN OpenStack cloud.

This is a summary of the findings using the HEPSpec 06 benchmark on KVM and a comparison with Hyper-V for the same workload.

For KVM on this workload, we saw a degradation in performance on large VMs.


Results for other applications may vary so each option should be verified for the target environment. The percentages from our optimisations are not necessarily additive but give an indication of the performance improvements to be expected. After tuning, we saw around 5% overhead from the following improvements.

OptionImprovementComments
CPU topology~0The primary focus for this function was not for performance so result is as expected
Host Model4.1-5.2%Some impacts on operations such as live migration
Turn EPT off6%Open bug report for CentOS 7 guest on CentOS 7 hypervisor
Turn KSM off0.9%May lead to an increase in memory usage
NUMA in guest~9%Needs Kilo or later to generate this with OpenStack
CPU Pinning~3%Needs Kilo or later (cumulative on top of NUMA)

Different applications will see a different range of improvements (or even that some of these options degrade performance). Experiences from other workload tuning would be welcome.

One of the things that led us to focus on KVM tuning was the comparison with Hyper-V. At CERN, we made an early decision to run a multi-hypervisor cloud building on the work by cloudbase.it and Puppet on Windows to share the deployment scripts for both CentOS and Windows hypervisors. This allows us to direct appropriate workloads to the best hypervisor for the job.

One of the tests when we saw a significant overhead on the default KVM configuration was to compare the performance overheads for a Linux configuration on Hyper-V. Interestingly, Hyper-V achieved better performance without tuning compared to the configurations with KVM. Equivalent tests on Hyper-V showed
  • 4 VMs 8 cores: 0.8% overhead compared to bare metal 
  • 1 VM 32 cores: 3.3% overhead compared to bare metal
These performance results allowed us to focus on the potential areas for optimisation, that we needed to tune the hypervisor rather than a fundamental problem with virtualisation (with the results above for NUMA and CPU pinning)

The Hyper-V configuration pins each core to the underlying  NUMA socket which is similar to how the Kilo NUMA tuning sets KVM up.


and


This gives the Linux guest configuration as seen from the guest running on a Hyper-V hypervisor

# numactl --hardware<o:p></o:p>
available: 2 nodes (0-1)<o:p></o:p>
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15<o:p></o:p>
node 0 size: 28999 MB<o:p></o:p>
node 0 free: 27902 MB<o:p></o:p>
node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31<o:p></o:p>
node 1 size: 29000 MB<o:p></o:p>
node 1 free: 28027 MB<o:p></o:p>
node distances:<o:p></o:p>
node   0   1<o:p></o:p>
  0:  10  20
  1:  20  10<o:p></o:p>

Thanks to the QEMU discuss mailing list and to the other team members who helped understand the issue (Sean Crosby (University of Melbourne) and Arne Wiebalck, Sebastian Bukowiec and Ulrich Schwickerath (CERN))

References



by Tim Bell (noreply@blogger.com) at February 07, 2017 07:45 PM

EPT, Huge Pages and Benchmarking


Having reported that EPT has a negative influence on the High Energy Physics standard benchmark HepSpec06, we have started the deployment of those settings across the CERN OpenStack cloud,
  • Setting the flag in /etc/modprobe.d/kvm_intel.conf to off
  • Waiting for the work on each guest to finish after stopping new VMs on the hypervisor
  • Changing the flag and reloading the module
  • Enabling new work for the hypervisor
According to the HS06 tests, this should lead to a reasonable performance improvement based on the results of the benchmark and tuning. However, certain users reported significantly worse performance than previously. In particular, some workloads showed significant differences in the following before and after characteristics.

Before the workload was primarily CPU bound, spending most of its time in user space. CERN applications have to process significant amounts of data so it is not always possible to ensure 100% utilisation but the aim is to provide the workload with user space CPU.


When EPT was turned off. some selected hypervisors showed a very difference performance profile. A major increase in non-user load and a reduction in the throughput for the experiment workloads. However, this effect was not observed on the servers with AMD processors.


With tools such as perf, we were able to trace the time down to handling the TLB misses. Perf gives

78.75% [kernel] [k] _raw_spin_lock<o:p></o:p>
6.76% [kernel] [k] set_spte<o:p></o:p>
1.97% [kernel] [k] memcmp<o:p></o:p>
0.58% [kernel] [k] vmx_vcpu_run<o:p></o:p>
0.46% [kernel] [k] ksm_docan<o:p></o:p>
0.44% [kernel] [k] vcpu_enter_guest<o:p></o:p>

The process behind the _raw_spin_lock is qemu-kvm.
<o:p></o:p>

Using systemtap kernel backtraces, we see mostly page faults and spte_* commands (shadow page table updates)
<o:p></o:p>
Both of these should not be necessary if you have hardware support for address translation: aka EPT.

There may be specific application workloads where the EPT setting was non optimal. In the worst case, the performance was several times slower.  EPT/NPT increases the cost of doing page table walks when the page is not cached in the TLB. This document shows how processors can speed up page walks - http://www.cs.rochester.edu/~sandhya/csc256/seminars/vm_yuxin_yanwei.pdf and AMD includes a page walk cache in their processor which speeds up the walking of pages as described in this paper  http://vglab.cse.iitd.ac.in/~sbansal/csl862-virt/readings/p26-bhargava.pdf

In other words, EPT slows down HS06 results when there are small pages involved because the HS06 benchmarks miss the TLB a lot. NPT doesn't slow it down because AMD has a page walk cache to help speed up finding the pages when it's not in the TLB. EPT comes good again when we have large pages because it rarely results in a TLB miss. So, HS06 is probably representative of most of the job types, but the is a small share of jobs which are different and triggered the above-mentioned problem.

However, we have 6% overhead compared to previous runs due to EPT on for the benchmark as mentioned in the previous blog. Mitigating the EPT overheads following the comments on the previous blog, we looked into using dedicated Huge Pages. Our hypervisors run CentOS 7 and thus support both transparent huge pages and huge pages. Transparent huge pages performs a useful job under normal circumstances but are opportunistic in nature. They are also limited to 2MB and cannot use the 1GB maximum size.

We tried setting the default huge page to 1G using the Grub cmdline configuration.

$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ cat /boot/grub2/grub.cfg | grep hugepage
linux16 /vmlinuz-3.10.0-229.11.1.el7.x86_64 root=UUID=7d5e2f2e-463a-4842-8e11-d6fac3568cf4 ro rd.md.uuid=3ff29900:0eab9bfa:ea2a674d:f8b33550 rd.md.uuid=5789f86e:02137e41:05147621:b634ff66 console=tty0 nodmraid crashkernel=auto crashkernel=auto rd.md.uuid=f6b88a6b:263fd352:c0c2d7e6:2fe442ac vconsole.font=latarcyrheb-sun16 vconsole.keymap=us LANG=en_US.UTF-8 default_hugepagesz=1G hugepagesz=1G hugepages=55 transparent_hugepage=never
$ cat /sys/module/kvm_intel/parameters/ept
Y

It may also be advisable to disable tuned for the moment until the bug #1189868 is resolved.

We also configured the XML manually to include the necessary huge pages. This will be available as a flavor or image option when we upgrade to Kilo in a few weeks.

  <memoryBacking>
        <hugepages>
          <page size="1" unit="G" nodeset="0-1"/>
        </hugepages>
  </memoryBacking>

The hypervisor was configured with huge pages enabled. However, we saw a problem with the distribution of huge pages across the NUMA nodes.

$ cat /sys/devices/system/node/node*/meminfo | fgrep Huge
Node 0 AnonHugePages: 311296 kB
Node 0 HugePages_Total: 29
Node 0 HugePages_Free: 0
Node 0 HugePages_Surp: 0
Node 1 AnonHugePages: 4096 kB
Node 1 HugePages_Total: 31
Node 1 HugePages_Free: 2

Node 1 HugePages_Surp: 0

This shows that the pages were not evenly distributed across the NUMA nodes., which would lead to subsequent performance issues. The suspicion is that the Linux boot up sequence led to some pages being used and this made it difficult to find contiguous blocks of 1GB for the huge pages. This led us to deploy 2MB pages rather than 1GB for the moment, while may not be the optimum setting allows better optimisations than the 4K settings and still gives some potential for KSM to benefit. These changes had a positive effect as the monitoring below shows when the reduction in system time.




At the OpenStack summit in Tokyo, we'll be having a session on Hypervisor Tuning so people are welcome to bring their experiences along and share the various options. Details of the session will appear at https://etherpad.openstack.org/p/TYO-ops-meetup.

Contributions from Ulrich Schwickerath and Arne Wiebalck (CERN) and Sean Crosby (University of Melbourne) have been included in this article along with the help of the LHC experiments to validate the configuration.


References

by Tim Bell (noreply@blogger.com) at February 07, 2017 07:44 PM

OpenStack CPU topology for High Throughput Computing

We are starting to look at the latest features of OpenStack Juno and Kilo as part of the CERN OpenStack cloud to optimise a number of different compute intensive applications.

We'll break down the tips and techniques into a series of small blogs. A corresponding set of changes to the upstream documentation will also be made to ensure the options are documented fully.

In the modern CPU world, a server consists of multiple levels of processing units.
  • Sockets where each of the processor chips are inserted
  • Cores where each processors contain multiple processing units which can run multiple processes in parallel
  • Threads (if settings such as SMT are enabled) may allow multiple processing threads to be active at the expense of sharing a core
The typical hardware used at CERN is a 2 socket system. This provides optimum price performance for our typical high throughput applications which simulate and process events from the Large Hadron Collider. The aim is not to process a single event as quickly as possible but rather to process the maximum number of events within a given time (within the total computing budget available). As the price of processors vary according to the performance, the selected systems are often not the fastest possible but the ones which give the best performance/CHF.

A typical example of this approach is in our use of SMT which leads to a 20% increase in total throughput although each individual thread runs correspondingly slower. Thus, the typical configuration is

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Stepping:              4
CPU MHz:               2999.953
BogoMIPS:              5192.93
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31


By default in OpenStack, the virtual CPUs in a guest are allocated as standalone processors. This means that for a 32 vCPU VM, it will appear as

  • 32 sockets
  • 1 core per socket
  • 1 thread per socket
As part of ongoing performance investigations, we wondered about the impact of this topology on CPU bound applications.

With OpenStack Juno, there is a mechanism to pass the desired topology. This can be done through flavors or image properties.

The names are slightly different between the two usages, with flavors using properties which start hw: and images with properties starting hw_

The flavor configurations are set by the cloud administrators and the image properties can be set by the project members. The cloud administrator can also set maximum values (i.e. hw_max_cpu_cores) so that the project members cannot define values which are incompatible with the underlying resources.


$ openstack image set --property hw_cpu_cores=8 --property hw_cpu_threads=2 --property hw_cpu_sockets=2 0215d732-7da9-444e-a7b5-798d38c769b5

The VM which is booted then has this configuration reflected.

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Stepping:              4
CPU MHz:               2593.748
BogoMIPS:              5187.49
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K

NUMA node0 CPU(s):     0-31

While this gives the possibility to construct interesting topologies, the performance benefits are not clear. The standard High Energy Physics benchmark show no significant change. Given that there is no direct mapping between the cores in the VM and the underlying physical ones, this may be because the cores are not pinned to the corresponding sockets/cores/threads and thus Linux may be optimising for a virtual configuration rather than the real one.

This work was in collaboration with Sean Crosby (University of Melbourne) and Arne Wiebalck (CERN).

The following documentation reports have been raised
  • Flavors Extra Specs -  https://bugs.launchpad.net/openstack-manuals/+bug/1479270
  • Image Properties - https://bugs.launchpad.net/openstack-manuals/+bug/1480519

References

by Tim Bell (noreply@blogger.com) at February 07, 2017 07:43 PM

Ed Leafe

API Longevity

How long should an API, once released, be honored? This is a topic that comes up again and again in the OpenStack world, and there are strong opinions on both sides. On one hand are the absolutists, who insist that once a public API is released, it must be supported forever. There is never any justification for either changing or dropping that API. On the other hand, there are pragmatists, who think that APIs, like all software, should evolve over time, since the original code may be buggy, or the needs of its users have changed.

I’m not at either extreme. I think the best analogy is that I believe an API is like getting married: you put a lot of thought into it before you take the plunge. You promise to stick with it forever, even when it might be easier to give up and change things. When there are rough spots (and there will be), you work to smooth them out rather than bailing out.

But there comes a time when you have to face the reality that staying in the marriage isn’t really helping anyone, and that divorce is the only sane option. You don’t make that decision lightly. You understand that there will be some pain involved. But you also understand that a little short-term pain is necessary for long-term happiness.

And like a divorce, an API change requires extensive notification and documentation, so that everyone understands the change that is happening. Consumers of an API should never be taken by surprise, and should have as much advance notice as possible. When done with this in mind, an API divorce does not need to be a completely unpleasant experience for anyone.

 

by ed at February 07, 2017 06:19 PM

RDO

Videos from the CentOS Dojo, Brussels, 2017

Last Friday in Brussels, CentOS enthusiasts gathered for the annual CentOS Dojo, right before FOSDEM.

While there was no official videographer for the event, I set up my video camera in the talks that I attended, and so have video of five of the sessions.

First, I attended the session covering RDO CI in the CentOS build management system. I was a little late to this talk, so it is missing the first few minutes.

<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/ZLdiPUHvKSE?list=PL27cQhFqK1Qya8JJxM9mtmZTwudi7dOmh" width="560"></iframe>

Next, I attended an introduction to Foreman, by Ewoud Kohl van Wijngaarden

<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/Zq_hT1bgfjs?list=PL27cQhFqK1Qya8JJxM9mtmZTwudi7dOmh" width="560"></iframe>

Spiros Trigazis spoke about CERN's OpenStack cloud. Unfortunately, the audio is not great in this one.

<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/fz3XIvkf8S4?list=PL27cQhFqK1Qya8JJxM9mtmZTwudi7dOmh" width="560"></iframe>

Nicolas Planel, Sylvain Afchain and Sylvain Baubeau spoke about the Skydive network analyzer tool.

<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/4mk_ROjaVRA?list=PL27cQhFqK1Qya8JJxM9mtmZTwudi7dOmh" width="560"></iframe>

Finally, there was a demo of Cockpit - the Linux management console by Stef Walter. The lighting is a little weird in here, but you can see the screen even when you can't see Stef.

<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/9HbQ8sjolPY?list=PL27cQhFqK1Qya8JJxM9mtmZTwudi7dOmh" width="560"></iframe>

by Rich Bowen at February 07, 2017 03:00 PM

OpenStack Superuser

How to design and implement successful private clouds with OpenStack

A new book aims to help anyone who has a private cloud on the drawing board make it a reality.

Michael Solberg and Ben Silverman wrote “OpenStack for Architects,” a guide to walk you through the major decision points to make effective blueprints for an OpenStack private cloud.  Solberg, chief architect at Red Hat, and Silverman principal cloud architect for OnX Enterprise Solutions, penned the 214-page book available in multiple formats from Packt Publishing. (It will also be available on Amazon in March.)

Superuser talked to Solberg and Silverman about the biggest changes in private clouds, what’s next and where you can find them at upcoming community events.

screen-shot-2017-01-30-at-15-38-51

Who will this book help most?

MS: We wrote the book for the folks who will be planning and leading the implementation of OpenStack clouds – the cloud architects. It answers a lot of the big picture questions that people have when they start designing these deployments – things like “How is this different than traditional virtualization?”, “How do I choose hardware or third-party software plugins?” and “How do I integrate the cloud into my existing infrastructure?” It covers some of the nuts and bolts as well – there are plenty of code examples for unit tests and integration patterns – but it’s really focused at the planning stages of cloud deployment.

What are some of the most common mistakes people make as beginners?

BS: I think that the biggest mistake people make is being overwhelmed by all of the available functionality in OpenStack and not starting with something simple. I’m pretty sure it’s human nature to want to pick all the bells and whistles when they are offered, but in the case of OpenStack it can be frustrating and overwhelming. Once beginners decide what they want their cloud to look like, they tend to get confused by all of the architectural options. While there’s an expectation that users should have a certain architectural familiarity with cloud concepts when working with OpenStack, learning how all of the interoperability works is still a gap for beginners. We’re hoping to bridge that gap with our new book.

What are some of the most interesting use cases now?

MS: The NFV and private cloud use cases are pretty well defined at this point. We’ve had a couple of really neat projects lately in the genomics space where we’re looking at how to best bring compute to large pools of data – I think those are really interesting. It’s also possible that I just think genomics is interesting, though.

How have you seen OpenStack architecture change since you’ve been involved?

MS: We talk about this a bit in the book. The biggest changes happening right now are really around containers. Both the impact of containers in the tenant space and on the control plane. The changes in architecture are so large that we’ll probably have to write a second edition as they get solidified over the next year or two.

Are there any new cases you’re working with (and still building) that you can talk about?

BS: The idea of doing mobile-edge computing using OpenStack as the orchestrator for infrastructure at the mobile edge is really hot right now. It is being led by the new ETSI Mobile-Edge Computing Industry Specification Group and has the backing of about 80 companies.

Not only would this type of OpenStack deployment have to support NFV workloads over the new mobile 5G networks, but support specialized workloads that have to perform at high bandwidth and low latency geographically close to customers. We could even see open compute work into this use case as service providers try and get the most out of edge locations. It has been very cool over the last few years to have seen traditional service providers taking NFV back to the regional or national data center, but it’s even cooler to see that they are now using OpenStack to put infrastructure back at the edge to extend infrastructure back out to customers.

Ben, curious about your tweet – “You must overcome the mindset that digital transformation is a tech thing, rather than an enterprise-wide commitment”…Why do people fall into this mentality? What’s the best cure for it?

BS: The common misconception for a lot of enterprises is that technology transformation simply takes a transformation of the technology. Unfortunately, that’s not the case when it comes to cloud technologies. Moving from legacy bare metal or virtualization platforms to a true cloud architecture won’t provide much benefit unless business processes and developer culture changes to take advantage of it.

The old adage “build it and they’ll come” doesn’t apply to OpenStack clouds. Getting executive sponsorship and building a grassroots effort in the internal developer community goes a long way towards positive cultural change. I always advise my clients to get small wins for cloud and agile development first and use those groups as cheerleaders for their new OpenStack cloud.

I tell them, “bring others in slowly and collect small wins. If you don’t go slow and get gradual adoption you’ll end up with accelerated rejection and even with executive sponsorship, you could find yourself with a great platform and no tenants.” I have seen this happen again and again simply because of uncontrolled adoption and the wrong workloads or the wrong team was piloted into OpenStack and had bad experiences.

Why is a book helpful now — there’s IRC, mailing lists, documentation, video tutorials etc.?

MS: That was actually one of the biggest questions we had when we sat down to write the book! We’ve tried to create content which answers the kinds of questions that aren’t easily answered through these kinds of short-form documentation. Most of the topics in the book are questions that other architects have asked us and wanted to have a verbal discussion around – either as a part of our day jobs or in the meetups.

BS: There are a lot of ways people can get OpenStack information today. Unfortunately I’ve found that a lot of it is poorly organized, outdated or simply,  incomplete. I find Google helpful for information about OpenStack topics, but, if you type “OpenStack architecture” you end up with all sorts of results. Some of the top results are the official OpenStack documentation pages which, thanks to the openstack-manuals teams are getting a major facelift (go team docs!). However, right below the official documentation are outdated articles and videos that are in the Cactus and Diablo timeframes. Not useful at all.

What’s on your OpenStack bookshelf?

MS: Dan Radez’s book was one of the first physical books I had bought in a long time. I read it before I started this book to make sure we didn’t cover content he had already covered there. I just finished “Common OpenStack Deployments” as well – I think that’s a great guide to creating a Puppet composition module, which is something we touch briefly on in our book.

BS: Looking at my bookshelf now I can see Shrivastwa and Sarat’s “Learning OpenStack” (full disclosure, I was the technical reviewer), James Denton’s second edition “Learning OpenStack Neutron” and an old copy of Doug Shelley and Amrith Kumar’s “OpenStack Trove.” Like Michael, I’ve got “Common OpenStack Deployments” on order and I’m looking forward to reading it.

And what is the “missing manual” you’d like to see?

BS: I would love to see “Beginner’s Guide to OpenStack Development and the OpenStack Python SDK(s)” I know enough Python to be dangerous and fix some low-hanging bugs, but a book that really dug into the Python libraries with examples and exercises would be pretty cool. It could even contain a getting started guide to help developers get familiar with the OpenStack development tools and procedures.

Are either of you attending the PTG and/or Boston Summit?

BS: I’ll be at the PTG, as a member of the openstack-manuals team, I’m looking forward to having some really productive sessions with our new project team lead (PTL) Alexandra Settle. We’ve already started discussing some of our goals for Pike, so we’re in good shape.

I’ll also be at the Summit in Boston, I’ve submitted a talk entitled “Taking Capacity Planning to Flavor Town – Planning your flavors for maximum efficiency.” My talk centers around the elimination of dead space in compute, storage and networking resources by deterministic flavor planning. Too many enterprises have weird-sized flavors all residing on the same infrastructure which leads to strange-sized orphan resources that are never consumed. On a small scale the impact is minimal but companies with thousands and tens of thousands of cores can recover hundreds of thousands of dollars in wasted CAPEX simply by planning properly.

MS: I’ll be at the Boston Summit – more for catching up with my colleagues in the industry than anything else. You can find me at the Atlanta OpenStack Meetup most months as well.

The authors are also keeping a blog about the book where you can find updates on signings and giveaways.

Cover photo by: Brian Rinker

The post How to design and implement successful private clouds with OpenStack appeared first on OpenStack Superuser.

by Nicole Martinelli at February 07, 2017 12:47 PM

OpenStack in Production

NUMA and CPU Pinning in High Throughput Computing


CERN's OpenStack cloud runs the Juno release on mainly CentOS 7 hypervisors.
Along with previous tuning options described in this blog which can be used on Juno, a number of further improvements have been delivered in Kilo.

Since this release will be installed at CERN during the autumn, we had to configure standalone KVM configurations to test the latest features, in particular around NUMA and CPU pinning.

NUMA features have been appearing in more recent processors that means memory accesses are no longer uniform. Rather than a single large pool of memory accessed by the processors, the performance of the memory access varies according to whether the memory is local to the processor.


A typical case above is where VM 1 is running on CPU 1 and needs a page of memory to be allocated. It is important that the memory allocated by the underlying hypervisor is the fastest access possible for the VM1 to access in future. Thus, the guest VM kernel needs to be aware of the underlying memory architecture of the hypervisor.

The NUMA configuration of a machine can be checked using lscpu. This shows two NUMA nodes on CERN's standard server configurations (two processors with 8 physical cores and SMT enabled)

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Stepping:              4
CPU MHz:               2257.632
BogoMIPS:              5206.18
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31

Thus, cores 0-7 and 16-23 are attached to the first NUMA node with the others on the second. The two ranges come from SMT. VMs however see a single NUMA node.

NUMA node0 CPU(s): 0-31


First Approach - numad

The VMs on the CERN cloud are distributed across different sizes. Since there is a mixture of VM sizes, NUMA has a correspondingly varied influence.



Linux provides the numad daemon which provides some automated balancing of NUMA workloads to move memory near to the processor where the thread is running.

In the case of 8 core VMs, numad on the hypervisor provided a performance gain of 1.6%.  However, the effects for larger VMs was much less significant. Looking at the performance for running 4x8 core VMs versus 1x32 core VM, there was significantly more overhead for the large VM case.




Second approach - expose NUMA to guest VM

This can be done using appropriate KVM directives. With OpenStack Kilo, these will be possible via the flavors extra specs and image properties. In the meanwhile, we configured the hypervisor with the following XML for libvirt.

<cpu mode='host-passthrough'>
<numa>
<cell id='0' cpus='0-7' memory='16777216'/>
<cell id='1' cpus='16-23' memory='16777216'/>
<cell id='2' cpus='8-15' memory='16777216'/>
<cell id='3' cpus='24-31' memory='16777216'/>
</numa>
</cpu>

In an ideal world, there would be two cells defined (0-7,16-23 and 8-15,24-31) but KVM currently does not support non-contiguous ranges on CentOS 7 [1]. The guests see the configuration as follows

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 62
Model name: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Stepping: 4
CPU MHz: 2593.750
BogoMIPS: 5187.50
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
NUMA node2 CPU(s): 16-23
NUMA node3 CPU(s): 24-31

With this approach and turning off numad on the hypervisor, the performance of the large VM improved by 9%.

We also investigated the numatune options but these did not produce a significant improvement.

Third Approach - Pinning CPUs

From the hypervisor's perspective, the virtual machine appears as a single process which needs to be scheduled on the available CPUs. While the NUMA configuration above means that memory access from the processor will tend to be local, the hypervisor may then choose to place the next scheduled clock tick on a different processor. While this is useful in the case of hypervisor over-commit, for a CPU bound application, this leads to less memory locality.

With Kilo, it will be possible to pin a virtual core to a physical one. The same was done using the hypervisor XML as for NUMA.

<cputune>
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="1"/>
<vcpupin vcpu="2" cpuset="2"/>
<vcpupin vcpu="3" cpuset="3"/>
<vcpupin vcpu="4" cpuset="4"/>
<vcpupin vcpu="5" cpuset="5"/>
...

This will mean that the virtual core #1 is always run on the physical core #1.
Repeating the large VM test provided a further 3% performance improvement.

The exact topology has been set in a simple fashion. Further investigation on getting exact mappings between thread siblings is needed to get the most of out of the tuning.

The impact on smaller VMs (8 and 16 core) is also needing to be studied. Optimising for one use case has a risk that other scenarios may be affected. Custom configurations for particular topologies of VMs increases the operations effort to run a cloud at scale. While the changes should be positive, or at minimum neutral, this needs to be verified.

Summary

Exposing the NUMA nodes and using CPU pinning has reduced the large VM overhead with KVM from 12.9% to 3.5%. When the features are available in OpenStack Kilo, these can be deployed by setting up the appropriate flavors with the additional pinning and NUMA descriptions for the different hardware types so that large VMs can be run at a much lower overhead.

This work was in collaboration with Sean Crosby (University of Melbourne) and Arne Wiebalck and Ulrich Schwickerath (CERN).

Previous blogs in this series are

Updates

[1] RHEV does support this with the later QEMU rather than the default in CentOS 7 (http://cbs.centos.org/repos/virt7-kvm-common-testing/x86_64/os/Packages/, version 2.1.2)

References

by Tim Bell (noreply@blogger.com) at February 07, 2017 09:43 AM

Christopher Smart

Fixing webcam flicker in Linux with udev

I recently got a new Dell XPS 13 (9360) laptop for work and it’s running Fedora pretty much perfectly.

However, when I load up Cheese (or some other webcam program) the video from the webcam flickers. Given that I live in Australia, I had to change the powerline frequency from 60Hz to 50Hz to fix it.

sudo dnf install v4l2-ctl
v4l2-ctl --set-ctrl power_line_frequency=1

I wanted this to be permanent each time I turned my machine on, so I created a udev rule to handle that.

cat << EOF | sudo tee /etc/udev/rules.d/50-dell-webcam.rules
SUBSYSTEM=="video4linux", \
SUBSYSTEMS=="usb", \
ATTRS{idVendor}=="0c45", \
ATTRS{idProduct}=="670c", \
PROGRAM="/usr/bin/v4l2-ctl --set-ctrl \
power_line_frequency=1 --device /dev/%k", \
SYMLINK+="dell-webcam"
EOF

It’s easy to test. Just turn flicker back on, reload the rules and watch the flicker in Cheese automatically disappear 🙂

v4l2-ctl --set-ctrl power_line_frequency=0
sudo udevadm control --reload-rules && sudo udevadm trigger

Of course I also tested with a reboot.

It’s easy to do with any webcam, just take a look on the USB bus for the vendor and product IDs. For example, here’s a Logitech C930e (which is probably the nicest webcam I’ve ever used, and also works perfectly under Fedora).

Bus 001 Device 022: ID 046d:0843 Logitech, Inc. Webcam C930e

So you would replace the following in your udev rule:

  • ATTRS{idVendor}==“046d”
  • ATTRS{idProduct}==“0843”
  • SYMLINK+=“c930e”

Note that SYMLINK is not necessary, it just creates an extra /dev entry, such as /dev/c930e, which is useful if you have multiple webcams.

by Chris at February 07, 2017 06:56 AM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
February 22, 2017 03:14 PM
All times are UTC.

Powered by:
Planet