July 21, 2017

NFVPE @ Red Hat

Any time in your schedule? Try using a custom scheduler in Kubernetes

I’ve recently been interested in the idea of extending the scheduler in Kubernetes, there’s a number of reasons why, but at the top of my list is looking at re-scheduling failed pods based on custom metrics – specifically for high performance high availablity; like we need in telecom. In my search for learning more about it, I discovered the Kube docs for configuring multiple schedulers, and even better – a practical application, a toy scheduler created by the one-and-only-kube-hero Kelsey Hightower. It’s about a year old and Hightower is on his game, so he’s using alpha functionality at time of authoring. In this article I modernize at least a component to get it to run in the contemporary day. Today our goal is to run through the toy scheduler and have it schedule a pod for us. We’ll also dig into Kelsey’s go code for the scheduler a little bit to get an intro to what he’s doing.

by Doug Smith at July 21, 2017 04:50 PM

OpenStack Superuser

Why large state-owned enterprises are boarding the OpenStack train

For more stories from the fast track, attend OpenStack Days China in Beijing July 24-25.

Often the fastest track to virtualization and cloud computing for large state-owned enterprises is to build their own private clouds with open-source software.

Case in point: China’s Sinorail Information Cloud. This joint effort from Beijing Sinorail Information Engineering Group and Beijing T2Cloud was designed to serve the IT infrastructure for railway construction and related industries. The first phase of large-scale deployment of the private cloud product SRCloud OS/T2Cloud OS with OpenStack and other open-source software launched in early 2017. Beijing Sinorail Information Engineering Group is a subsidiary of China Railway Information Technology Center, providing IT services for railway and other industries. Cloud Tuteng, a well-known cloud platform product and service provider, is also a longstanding OpenStack proponent in China.

China’s massive railway system is a pillar of local industry: its tracks stretch some 124,000 km (77,000 miles) and it also boasts 20,000 km (12,500 miles) of high-speed rail lines, more than the rest of the world combined. New investment in railway continues apace at over 800 billion yuan per year, or $123.6 billion. Consequently, the IT architecture of railway faces an urgent need to be upgraded from the mainframes and minicomputers and centralized storage, blade servers and virtualization plus centralized storage to cloud computing. The high agility of cloud computing can quickly respond to launches and updates required by new initiatives and the high reliability of cloud computing ensures the smooth of operation of passenger and freight transportation.

Why did they get aboard OpenStack? Rao Wei, deputy general manager of Beijing Sinorail Information Engineering Group, says that when they were evaluating options, the ability to modify hardware and software and avoid vendor lock-in were key requirements. The team also found that OpenStack was a good way to go because of the development of the technology, the community and solid use cases.

Wei was aided on the journey by two excellent teams. To reach the four R&D goals of Sinorail Information Cloud product—stability, safety, flexibility and usability— T2Cloud and Beijing Sinorail Information Engineering Group made joint efforts to combine OpenStack modules, technology and ecology of Sinorail, automatic deployment, automatic detection and other tools and found an feasible way forward.

Tests proved that the OpenStack product developed jointly by the two teams obtained stable operations of 100,000 cloud hosts, 800 nodes in a single zone in the first phase of the project deployment. No one working on the project had never seen such a large volume of service deployment, the overall parallel I/O, pressure on storage, pressure on the network and bandwidth on any conventional platform.

In view of potential problems with the original OpenStack components, authentication system, memory interface, network interface, under-network DHCP service, forwarding service, the Sinorail project prepared well before deployment, including project preview, environmental pressure measurement and code repair, Wei says.

During the deployment process, the 800 physical nodes were deployed automatically through MagicStack and other automation tools developed by T2Cloud. The entire process took just one day. To ensure the stable operation of customer services, they also migrated a number of application services – such as Web services and file system services in the testing environment – and automatically deployed them, says Fan Weipeng, head of the Sinorail project.

When choosing a partner, Beijing Sinorail Information Engineering Group evaluated a number of domestic and foreign OpenStack vendors before going with T2Cloud. A deciding factor was the hands-on approach focused on helping customers truly grasp open-source technology and apply it independently in production, rather than simply acting as a go-between. In nearly two years of the joint R&D process, both sides invested heavily to reap stability and reliability.

The Sinorail project may offer a glimpse of where OpenStack is headed in China. It smoothed over what was viewed as the business predicament of OpenStack deployment and maintenance difficulties, explored the viability of creating an OpenStack product and demonstrated how vendors and users can work closely together. “We’re happy to share the experience and contribute to OpenStack,” concludes Wei adding that he hopes to have more R&D projects to join that will promote the domestic application of OpenStack while upholding the spirit of openness.

 

This post is a translation of an original article by Xiong Wei.

Cover Photo // CC BY NC

The post Why large state-owned enterprises are boarding the OpenStack train appeared first on OpenStack Superuser.

by Superuser at July 21, 2017 11:30 AM

OpenStack Blog

User Group Newsletter – July 2017

Sydney Summit

We’re getting excited as the Sydney Summit draws closer. Don’t miss out on your early bird tickets, sales end September 1. Find your Summit pocket guide hereIt includes information about where to stay, featured speakers, a Summit timeline, the OpenStack Academy and much more.

An important note regarding travel. All non-Australian residents will need a visa to travel to Australia (including United States citizens). Click here for more information 

Travel Support Program

Need some support to make the trip? You can apply for the Travel Support ProgramSuperuser has a great article with handy tips to help you complete your application. Find the Superuser article here. 

Superuser Awards

The Superuser Awards recognize teams using OpenStack to meaningfully improve business and differentiate in a competitive industry, while also contributing back to the community. Nominations for the OpenStack Summit Sydney Superuser Awards are open and will be accepted through midnight Pacific Time September 8. Find out more information via this Superuser article. 

User Survey

Make your voice heard in the User survey. It’s available in 7 languages (Chinese (traditional & simplified), Japanese, Korean, German, Indonesian). Submissions close on the 11th of August. Complete it here. 

User Committee Elections

The User Committee Elections are right around the corner. Active user contributors (AUCs) — including operators, contributors, event organizers and working group members, are invited to apply. Nominations open July 31 and close on August 11th. Find out all you need know with this Superuser article. 

Boston Summit recap

We hope you all enjoyed the Boston Summit in May. Catch up on the sessions you weren’t able to see, via the OpenStack Foundation’s YouTube channel.

Certified OpenStack Administrator exam

OpenStack skills are in high demand as thousands of companies around the world adopt and productize OpenStack. The COA is the first professional certification offered by the OpenStack Foundation. It’s designed to help companies identify top talent in the industry, and help job seekers demonstrate their skills. For more information, head to the COA website. You can also check out this video. 

Call for Papers

There are a number of call for papers for upcoming events:

OpenStack Days

There are a number of upcoming OpenStack Days this year across the globe. See the full calendar here.

New User Groups

Welcome to our new user groups!

Looking for your local user group or want to start one in your area? Head to the groups portal.

Looking for speakers?

If you’re looking for speakers for your upcoming event or meetup, check out the OpenStack Speakers Bureau. It contains a fantastic repository of contacts, including information such as their past talks, languages spoken, country of origin and travel preferences.

Jobs Portal

Find that next great opportunity via the OpenStack Jobs Portal.

Are you following the Foundation on social media? Check out each of our channels today.

Twitter, Linkedin, Facebook, YouTube

 

Contributing to the User Group Newsletter.

If you’d like to contribute a news item for next edition, please submit to this etherpad.

Items submitted may be edited down for length, style and suitability.

by Sonia Ramza at July 21, 2017 06:56 AM

July 20, 2017

RDO

Beth Elwell talks about her work on OpenStack Horizon in the Ocata cycle - OpenStack PTG

Beth Elwell talks about her work on OpenStack Horizon in the Ocata cycle, at the OpenStack PTG in Atlanta.

by Rich Bowen at July 20, 2017 08:45 PM

Miguel Ángel Ajo - OpenStack Neutron QOS - OpenStack PTG

Miguel Ángel Ajo talks about the work done on the OpenStack Neutron Quality of Service plugin in the Ocata release, at the OpenStack PTG in Altanta.

by Rich Bowen at July 20, 2017 08:14 PM

Matthew Treinish

Dirty Clouds Done Cheap

I recently gave a presentations at the OpenStack Summit in Boston with the same title. You can find a video of the talk here: https://www.openstack.org/videos/boston-2017/dirty-clouds-done-dirt-cheap. This blog post will cover the same project, but it will go into a lot more detail, which I couldn’t cover during the presentation.

Just a heads up this post is long!  I try to cover every step of the project with all the details I could remember. It probably would have made sense to split things up into multiple posts, but I wrote it in a single sitting and doing that felt weird. If you’re just looking for a quicker overview, I recommend watching the video of my talk instead.


The Project Scope

When I was in college I had a part time job as a sysadmin at a HPC research lab in the aerospace engineering department. In that role I was responsible for all aspects of the IT in the lab, from the workstations and servers to the HPC clusters. In that role I  often had to deploy new software with no prior knowledge about it. I managed to muddle through most of the time by reading the official docs and frantically google searching when I encountered issues.

Since I started working on OpenStack I often think back to my work in college and wonder if I had been tasked with deploying an OpenStack cloud back then would I have been able to? As a naive college student who had no knowledge of OpenStack would I have been successful in trying to deploy OpenStack by myself? Since I had no knowledge  of configuration management (like puppet or chef) back then I would have gone about it by installing everything by hand. Basically the open question from that idea is how hard is it actually to install OpenStack by hand  using the documentation and google searches?

Aside from the interesting thought exercise I also have wanted a small cloud at home for a couple of reasons. I maintain a number of servers at home that run a bunch of critical infrastructure. For some time I’ve wanted to virtualize my home infrastructure mainly just for the increased flexibility and potential reliability improvements.  Running things off a residential ISP and power isn’t the best way to run a server with a decent uptime.  Besides virtualizing some of my servers it would be nice to have the extra resources for my upstream OpenStack development, I often do not have the resources available to me for running devstack or integration tests locally and have to rely on upstream testing.

So after the Ocata release I decided to combine these 2 ideas and build myself a small cloud at home. I would do it by hand (ie no automation or config management) to test out how hard it would be. I set myself a strict budget of $1500 USD (the rough cost of my first desktop computer in middle school, an IBM Netvista A30p that I bought with my Bar Mitzvah money) to acquire hardware. This was mostly just a fun project for me so I didn’t want to spend an obscene amount of money. $1500 USD is still a lot of money, but it seemed like a reasonable amount for the project.

However, I decided to take things a step further than I originally planned and build the cloud using the release tarballs from http://tarballs.openstack.org/. My reasoning behind this was to test out how hard it would be to take the raw code we release as a community and turn that into a working cloud. It basically invalidated the project as a test for my thought exercise of deploying the cloud if I was back in college (since I definitely would have just used my Linux distro’s packages back then) but it made the exercise more relevant for me personally as an upstream OpenStack developer. It would give me insight as to where what we’re there are gaps in our released code and how we could start to fix them.

Building the Cloud

Acquiring the Hardware

The first step for building the cloud was acquiring the hardware. I had a very tight budget and it basically precluded buying anything new. The cheapest servers you can buy from a major vendor would pretty much eat up my budget for a single machine. I also considered building a bunch of cheap desktops for the project and putting those together as a cloud. (I didn’t actually need server class hardware for this cloud) But for the cost the capacity was still limited. Since I was primarily building a compute cloud to provide me with a pool of servers to allocate My first priority was the number of CPU cores in the cloud. This would give me the flexibility to scale any applications I was running on it. With that in mind I decided on the priority list for the hardware of:

  1. Number of Cores
  2. Amount of RAM
  3. Speed of CPU

The problem with building with desktop CPUs is (at the time I was assembling pieces) the core count / USD was not really that high for any of the desktop processors. Another popular choice for home clouds is the Intel NUCs, but these suffer from the same problem. The NUCs use laptop processors and while reasonably priced you’re still only getting a dual or quad core CPU for a few hundred dollars.

It turns out the best option I found for my somewhat bizarre requirements was to buy used hardware. A search of eBay shows a ton of servers from 8 or 9 years ago that are dirt cheap. After searching through my various options I settled on old Dell PowerEdge R610, which was a dual socket machine. The one I ordered came with 2 Intel Xeon E5540 CPUs in it. This gave me a total of 8 physical cores (or 16 virtual cores if you count HyperThreading/SMT) The machines also came with 32 GB of RAM and 2x 149GB SAS hard drives. The best part though was that each machine was only $215.56 USD. This gave me plenty of room in the budget, so I bought 5 of them. After shipping this ended up costing only $1,230.75. That gave me enough wiggle room for the other parts I’d need to make everything working. The full hardware specs from the eBay listing was:

Although, the best part about these servers were that I actually had a rack full of basically the same exact servers at the lab in college. The ones I had back in 2010  were a little bit slower and had half the RAM, but were otherwise the same. I configured those servers as a small  HPC cluster my last year at college, so I was very familiar with them. Although back then those servers were over 10x the cost as what I was paying for them on eBay now.

The only problem with this choice was the hardware, the Xeon E5540 is incredibly slow by today’s standards. But, because of my limited budget speed was something I couldn’t really afford.

Assembling the Hardware

After waiting a few days the servers were delivered.  That was a fun day, the FedEx delivery person didn’t bother to ring the door bell. Instead I heard big thud outside and found that they had left all the  boxes in front of my apartment door. Fortunately I was home and heard them throw the boxes on the ground, because it was raining that day. Leaving my “new” servers out in the rain all day would have been less than an ideal way to start the project . It also made quite the commotion and several of my neighbors came out to see what was going on and watched me as I took the boxes inside.

After getting the boxes inside my apartment and unboxed, I stacked them on my living room table:


My next problem with this was where to put the servers and how to run them. I looked at buying a traditional rack, however they were a bit too pricey. (even on eBay) Just a 10U rack looked like it would cost over $100 USD and after shipping that wouldn’t leave me too much room if I needed something else. So I decided not to go that route. Then I remembered hearing about something called a LackRack a few years ago. It turns out the IKEA Lack table has a 19 inch width between the legs which is the same as a rack. They also only cost $9.99 which made it a much more economical choice compared to a more traditional rack. However, while I could just put the table on the floor and be done with it, I was planning to put the servers in my “data closet” (which is just a colorful term for my bedroom closet where I store servers and clothing) but I didn’t want to deal with having to pick up the “rack” every time I needed to move it. So I decided to get some casters and mount them to the table so I could just push the server around.

Once I got the table delivered, which took a surprisingly long time, I was able to mount he casters and rack the servers. As I put each server on the table I was able to test each of them out. (I only had a single power cable at the time, so I went one at a time) It turns out that each server was slightly different from the description and had several issues:

  • 4x8GB of RAM not 8x4GB
  • Memory installed in wrong slots
  • Dead RAID controller battery
  • Came with 15k RPM hard drives not 10k RPM

Also, the company that is “refurbishing” these old servers from whatever datacenter threw them away totally strips the servers down to the minimum possible unit. For example, the management controller was removed, as was the redundant power supply. Both of these were standard feature from Dell when these servers were new. Honestly, it makes sense, the margins on reselling old servers can’t be very high so the company is trying to make a little profit. I also really didn’t need anything they took out as long as the servers still booted.  (although that management controller would have been nice)

Once I put all 5 servers on the rack:
After getting everything mounted on the rack it turns out I also needed a bunch of cables and another power strip to power all 5 at once. So I placed an order with Monoprice for the necessary bits and once they arrived I wired everything up in the data closet:

After everything was said and done the final  bill of materials for all the hardware was:

Part NameLinkPriceNumberShipping & TaxTotal
Dell PowerEdge R610 Virtualization Server 2.53GHz 8-Core E5540 32GB 2x146G PERC6http://www.ebay.com/itm/191545700823$215.565$152.95$1,230.75
LACK Side table, yellow
http://www.ikea.com/us/en/catalog/products/40104270/#/10324278$9.991$13.65$23.64
6 Outlet Power Striphttps://www.monoprice.com/product?c_id=109&cp_id=10907&cs_id=1090703&p_id=13692&seq=1&format=2$7.90110.49$18.39
3ft 16AWG Power Cordhttps://www.monoprice.com/Product?p_id=5285$1.6050$8.00
FLEXboot Series Cat6 24AWG UTP Ethernet Network Patch Cable, 10ft Orangehttps://www.monoprice.com/product?p_id=9808$1.76100$17.60
Castershttps://www.amazon.com/gp/product/B01FJ97E64/ref=od_aui_detailpages00?ie=UTF8&psc=1$29.9910$29.99
Grand Total:$1,328.37

Installing the Operating System

After getting the working set of hardware the next step was to install the operating system on the servers.  As I decided in the original project scope I was planning to follow the official install guide as much as possible.  My operating system choice would therefore be dictated by those covered in the guide, the 3 Linux distributions documented were OpenSUSE/SLES, RHEL/CentOS, and Ubuntu. Of those the 3 my personal choice was Ubuntu which I personally find the easiest to deal with out of the choices. Although, looking back on it now if I were to do an install during job college I definitely would of have used RHEL. Georgia Tech had a site license for RHEL and a lot of software we had commercial licenses for only had support on RHEL. But, my preference today between those 3 options is to use Ubuntu.

I created a boot usb stick for Ubuntu Server 16.04 and proceeded to do a basic install on each server. (one at a time) The install itself just used the defaults, the only option I made sure was present was the OpenSSH server. This way once I finished the initial install I didn’t have to sit in front of the server to do anything.  I would just install any other packages I needed after the install from the comfort of my home office.  For the hostname I picked altocumulus because I think clouds should be named after clouds. Although, after I finished the project I got a bunch of better suggestions for the name like closet-cloud or laundry-cloud.

It’s worth pointing out that if the servers had come with the management controller installed this step would have been a lot easier. I could have just used that to mount the installer image and ran everything from the virtual console. I wouldn’t have had to sit in front of each server to start the install. But despite this it only took an hour or so to perform the install on all the servers. With the installs complete it was time to start the process of putting OpenStack on each server and creating my cloud.

Installing OpenStack

With the operating system installed it’s time to start the process of building the servers out. Given my limited hardware capacity, just 40 physical cores and 160GB of RAM, I decided that I didn’t want to sacrifice 1/5 of that capacity for a dedicated controller node. So I was going to setup the controller as a compute node as well.  My goal for this project was to build a compute cloud, so all I was concerned about was installing the set of OpenStack projects required to achieve this. I didn’t have a lot of storage (the 2 149GB disks came configured out of the box with RAID 1 and I never bothered to change that) so providing anything more than ephemeral storage for the VMs wasn’t really an option.

OpenStack is a large project with a ton of different projects, (the complete list of official projects can be found here) But, I find some people have trouble figuring out exactly where to get started or for configuration X where to get started. The OpenStack Foundation actually has a page with a bunch of sample service selections by application. The OpenStack Technical Committee also maintains a list of projects needed for the compute starter kit which was exactly what I was looking for. The only potential problem is the discoverability of that information.  It kinda feels like a needle in the haystack if you don’t know where to look.

It also turns out the install guide is mostly concerned with building a basic compute cloud (it also includes using cinder for block storage, but I just skipped that step) so even if I didn’t know the components I needed I would have been fine just reading the docs The overview section of the docs covers this briefly, but doesn’t go into much detail.

The basic service configuration I was planning to go with was:

With a rough idea of how I was planning to setup the software I started following the install guide on setting up the server.  https://docs.openstack.org/ocata/install-guide-ubuntu/environment.html# walks you through setting up all the necessary Operating System level pieces like configuring the networking interfaces and NTP. It also goes over installing and configuring  the service prerequisites like MySQL, RabbitMQ, and memcached. For this part I actually found the docs really easy to follow and very useful. Things were explained clearly and mostly it was just copy and paste the commands to set things up. But, I never felt like I was blindly doing anything for the base setup.

Installing and Configuring the OpenStack Components

After getting the environment for running OpenStack configured it was time to start installing the OpenStack components. Keystone is a requirement for all the other OpenStack services so you install this first. This is where I hit my first issue because I decided to use the release tarballs for the install. The install guide assumes you’re using packages from your Linux distribution to install OpenStack. So when I got to the second step in the Installing Keystone section of the install guide it said run “apt install keystone” which I didn’t want to do. (although it definitely would have made my life easier if I did)

Installing From Tarballs

It turns out there isn’t actually any documentation anywhere that concisely explains the steps required to installing an OpenStack component on your system from source. I started doing searching on Google to try and find any guides. The first hit was a series of blog posts on the Rackspace  developer blog on installing OpenStack from source. However, a quick look at this showed this was quite out of date, especially for the latest version of OpenStack, Ocata, which I was deploying. Also, some of the steps documented there conflicted with the configuration recommended in the install guide. The other searches I found recommended that you look at devstack or use automation project X to accomplish this goal.  Both of these were outside the scope of what I wanted to do for this project. So for the tarball install step I decided to ignore the premise of just following the install guide and just used my experience working on OpenStack to do the following steps to install the projects:

1. Download the service tarballs. I found the releases page has a good index by project to get the latest tarball for each project. Then extract that tarball to an easy remember location. (I created a tarballs directory in my home directory to store them all)

2. Create the service user for each project. I ran:

useradd -r -M $service

3. Create the /etc and /var/lib directories for the service. For example on the controller node I used the following for loop in bash to do this for all the services:

for proj in keystone glance nova neutron ; do
    sudo mkdir /etc/$proj
    sudo mkdir /var/lib/$proj
    sudo chown -R $proj:$proj /etc/$proj /var/lib/$proj
done

4. Install the binary package requirements for the project. This is things like libvirt for nova, or libssl. Basically anything you need to have installed to either build the python packages or to run the service. The problem here is that for most projects this is not documented anywhere. Most of the projects include a bindep.txt which can be used with the bindep project (or just manually read like I did) to show the distro package requirements on several distributions, but few projects use it for this. Instead it’s often just used for just the requirements for setting up a unit (or functional) test environment. I also didn’t find it in any of the developer documentation for the projects. This means you’re probably stuck with trial and error here. When you get to step 6 below it will likely fail with an error that a library header is missing and you’ll need to find the package and install that. Or when you run the service something it’s calling out to is missing and you’ll have errors in the service log until you install that missing dependency and restart the service.

5. Copy the data files from etc/  in the tarball into /etc/$service for the project. The python packaging ecosystem does not provide a way for packages to install anything outside of the python lib/ directories. This means that to install the required configuration files (like policy.json files or api paste.ini files) have to be copied manually from the tarball.

6.  After you do all of those steps you can use pip to install the tarball. One thing to note here is that you want to use constraints when you run pip. This is something I forgot installing the first few services and it caused me a ton of headaches later in the process. You can avoid all of those potential problems up front by just running:

pip install -U -c "https://git.openstack.org/cgit/openstack/requirements/plain/upper-constraints.txt?h=stable/ocata" $PATH_TO_EXTRACTED_TARBALL

If you’re using a different OpenStack release just replace “ocata” at the end of the url with that release name.

I wrote down these steps after I did the install mostly based on all of the issues I had during the install process. As you read through the rest of this post most of the issues I encountered could have been completely avoided if I did all of these up front.

It’s also worth noting that all of these steps are provided by the distro packages for OpenStack. This is exactly the role that packaging plays for users, and I was just going through the motions here because I decided to use tarballs. Python packages aren’t really designed for use in systems software and have a lot of limitations beyond the basic case of: put my python code in the place where python code lives. If you want more details on this Clark Boylan gave a good talk on this topic at the OpenStack summit in Boston.

I have also been trying to make a push to start documenting these things in the project developer docs so it’s not an exercise in misery for anyone else wanting to install from source. But, I’ve been getting push back on this because most people seem to feel like it’s a low priority and most people will just use packages. (and packagers seem to have already figured out the pattern for building things)

Creating systemd unit files

One thing that isn’t strictly a requirement when installing from source is creating systemd unit files. (or init scripts if you’re lucky enough to have a distro that still supports using SysV init) Creating a systemd unit file for each daemon process you’ll be running is helpful so you don’t have to manually run the command for each daemon. When I built the cloud I created a unit file for each daemon I ran on both the controller as well as all of the compute nodes. This enabled me to configure each service to start automatically on boot, but also encode the command for starting the daemons, so I could treat it like any other service running on the system. This is another thing that distro packages provide for you, but you’ll have to do yourself when building from source.

For an example this is the contents of my nova-api systemd unit file which I put in /etc/systemd/system/nova-api.service:

[Unit]
Description=OpenStack Nova API
After=network.target

[Service]
ExecStart=/usr/local/bin/nova-api --config-file /etc/nova/nova.conf
User=nova
Group=nova

[Install]
WantedBy=multi-user.target

All the other service follow this same format, except for anything running under uwsgi (like keystone, more on that in the next section) , but you can refer to the uwsgi docs for more information on that.

Configuring Keystone

With the formula worked out for how to install from tarball I was ready to continue following the install guide.  The only other issue I had was setting up running the wsgi script under apache. By default keystone ships as a wsgi script that requires a web server to run it. The install guide doesn’t cover this because the distro packages will do the required setup for you. But, because I was installing from tarballs I had to figure out how to do this myself. Luckily the keystone docs provide a guide on how to do this, and include sample config files in the tarball. The rest of configuring keystone was really straightforward, the keystone.conf only required 2 configuration options. (one for the database connection info and the other for the token type) After setting those I had to run a handful of commands to update the database schema and then populate it with some initial data. It’s not worth repeating all the commands here, since you can just read the keystone section of the install guide. In my case I did encounter one issue when I first started the keystone service. I hit a requirements mismatch which prevent keystone from starting:

2017-03-29 15:27:01.478 26833 ERROR keystone Traceback (most recent call last):
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/bin/keystone-wsgi-admin", line 51, in <module>
2017-03-29 15:27:01.478 26833 ERROR keystone     application = initialize_admin_application()
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/keystone/server/wsgi.py", line 132, in initialize_admin_application
2017-03-29 15:27:01.478 26833 ERROR keystone     config_files=_get_config_files())
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/keystone/server/wsgi.py", line 69, in initialize_application
2017-03-29 15:27:01.478 26833 ERROR keystone     startup_application_fn=loadapp)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/keystone/server/common.py", line 50, in setup_backends
2017-03-29 15:27:01.478 26833 ERROR keystone     res = startup_application_fn()
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/keystone/server/wsgi.py", line 66, in loadapp
2017-03-29 15:27:01.478 26833 ERROR keystone     'config:%s' % find_paste_config(), name)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/keystone/version/service.py", line 53, in loadapp
2017-03-29 15:27:01.478 26833 ERROR keystone     controllers.latest_app = deploy.loadapp(conf, name=name)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
2017-03-29 15:27:01.478 26833 ERROR keystone     return loadobj(APP, uri, name=name, **kw)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
2017-03-29 15:27:01.478 26833 ERROR keystone     return context.create()
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 15:27:01.478 26833 ERROR keystone     return self.object_type.invoke(self)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2017-03-29 15:27:01.478 26833 ERROR keystone     **context.local_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2017-03-29 15:27:01.478 26833 ERROR keystone     val = callable(*args, **kw)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/urlmap.py", line 31, in urlmap_factory
2017-03-29 15:27:01.478 26833 ERROR keystone     app = loader.get_app(app_name, global_conf=global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 350, in get_app
2017-03-29 15:27:01.478 26833 ERROR keystone     name=name, global_conf=global_conf).create()
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 362, in app_context
2017-03-29 15:27:01.478 26833 ERROR keystone     APP, name=name, global_conf=global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 450, in get_context
2017-03-29 15:27:01.478 26833 ERROR keystone     global_additions=global_additions)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 559, in _pipeline_app_context
2017-03-29 15:27:01.478 26833 ERROR keystone     APP, pipeline[-1], global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 454, in get_context
2017-03-29 15:27:01.478 26833 ERROR keystone     section)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 476, in _context_from_use
2017-03-29 15:27:01.478 26833 ERROR keystone     object_type, name=use, global_conf=global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 406, in get_context
2017-03-29 15:27:01.478 26833 ERROR keystone     global_conf=global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 296, in loadcontext
2017-03-29 15:27:01.478 26833 ERROR keystone     global_conf=global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 328, in _loadegg
2017-03-29 15:27:01.478 26833 ERROR keystone     return loader.get_context(object_type, name, global_conf)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 620, in get_context
2017-03-29 15:27:01.478 26833 ERROR keystone     object_type, name=name)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 640, in find_egg_entry_point
2017-03-29 15:27:01.478 26833 ERROR keystone     pkg_resources.require(self.spec)
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 943, in require
2017-03-29 15:27:01.478 26833 ERROR keystone     needed = self.resolve(parse_requirements(requirements))
2017-03-29 15:27:01.478 26833 ERROR keystone   File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 834, in resolve
2017-03-29 15:27:01.478 26833 ERROR keystone     raise VersionConflict(dist, req).with_context(dependent_req)
2017-03-29 15:27:01.478 26833 ERROR keystone ContextualVersionConflict: (requests 2.13.0 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('requests!=2.12.2,!=2.13.0,>=2.10.0'), set(['oslo.policy']))

This was caused solely because I forgot to use pip constraints at first when I started installing the controller node (I remembered later). Pip doesn’t have a dependency solver and just naively installs packages in the order its told. This causes all sorts of conflicts if 2 packages have the same requirement with different versions. (even if there is overlap and a correct version can be figured out) Using constraints like I recommended before would have avoided this. But after resolving the conflict keystone worked perfectly and I was ready to move on to the next service.

Installing Glance

The next service to install by following the install guide is Glance. The process for configuring glance was pretty straightforward. Just as with keystone it’s not worth repeating all the steps from the install guide section on Glance.  But, at a high level you just create the database in mysql, configure glance with the details for connecting to MySQL, connecting to Keystone, and how to store images. After that you run the DB schema migrations to set the schema for the MySQL database, and create the endpoint and service users in keystone. After going through all the steps I did encounter one problem in Glance when I first started it up. The glance log had this traceback:

2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data Traceback (most recent call last):
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/api/v2/image_data.py", line 116, in upload
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     image.set_data(data, size)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/domain/proxy.py", line 195, in set_data
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     self.base.set_data(data, size)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/notifier.py", line 480, in set_data
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     _send_notification(notify_error, 'image.upload', msg)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     self.force_reraise()
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     six.reraise(self.type_, self.value, self.tb)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/notifier.py", line 427, in set_data
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     self.repo.set_data(data, size)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/api/policy.py", line 192, in set_data
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     return self.image.set_data(*args, **kwargs)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/quota/__init__.py", line 304, in set_data
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     self.image.set_data(data, size=size)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance/location.py", line 439, in set_data
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     verifier=verifier)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance_store/backend.py", line 453, in add_to_backend
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     verifier)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance_store/backend.py", line 426, in store_add_to_backend
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     verifier=verifier)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data   File "/usr/local/lib/python2.7/dist-packages/glance_store/capabilities.py", line 223, in op_checker
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data     raise op_exec_map[op](**kwargs)
2017-03-29 16:21:52.038 29647 ERROR glance.api.v2.image_data StoreAddDisabled: Configuration for store failed. Adding images to this store is disabled.

I forgot to create the /var/lib/glance dir, so there was no directory to store the images in. Again something else which would have been fix if I followed the steps I outlined in the installing from tarballs section. But after creating the directory everything worked.

One thing I do want to note here is that I have small issue with the verification steps for Glance outlined in the install guide. The steps there don’t really go far enough to verify the image uploaded was actually stored properly, just that glance created the image. This was a problem I had later in the installation and I could have caught it earlier if the verification steps instructed you to download the image from glance and compare it to the source image.

Installing Nova

The next service in the install guide is Nova. Nova was a bit more involved compared to Glance or Keystone, but it has more moving parts so that’s understandable. Just as with the other services refer to the install guide section for Nova for all the step by step details. There are more steps for nova in general so it’s not worth even outlining the high level flow here. One thing you’ll need to be aware of is that Nova includes 2 separate API services that you’ll be running, the Nova API and the Placement API. The Placement API is a recent addition since Newton which is used to provide data for scheduling logic and is a completely self contained service. Just like keystone, the placement API only ships as a wsgi script. But unlike keystone there was no documentation (this has changed, or in progress at least) about the install process and no example config files provided. It’s pretty straightforward to adapt what you used to keystone, but this was another thing I had to figure out on my own.

After getting everything configured according to the install guide I hit a few little things that I needed to fix. The first was that I forgot to create a state directory that I specified in the config file:

2017-03-29 17:46:28.176 32263 ERROR nova Traceback (most recent call last):
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/bin/nova-api", line 10, in <module>
2017-03-29 17:46:28.176 32263 ERROR nova     sys.exit(main())
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/cmd/api.py", line 59, in main
2017-03-29 17:46:28.176 32263 ERROR nova     server = service.WSGIService(api, use_ssl=should_use_ssl)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/service.py", line 311, in __init__
2017-03-29 17:46:28.176 32263 ERROR nova     self.app = self.loader.load_app(name)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/wsgi.py", line 497, in load_app
2017-03-29 17:46:28.176 32263 ERROR nova     return deploy.loadapp("config:%s" % self.config_path, name=name)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
2017-03-29 17:46:28.176 32263 ERROR nova     return loadobj(APP, uri, name=name, **kw)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
2017-03-29 17:46:28.176 32263 ERROR nova     return context.create()
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 17:46:28.176 32263 ERROR nova     return self.object_type.invoke(self)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2017-03-29 17:46:28.176 32263 ERROR nova     **context.local_conf)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2017-03-29 17:46:28.176 32263 ERROR nova     val = callable(*args, **kw)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/urlmap.py", line 160, in urlmap_factory
2017-03-29 17:46:28.176 32263 ERROR nova     app = loader.get_app(app_name, global_conf=global_conf)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 350, in get_app
2017-03-29 17:46:28.176 32263 ERROR nova     name=name, global_conf=global_conf).create()
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 17:46:28.176 32263 ERROR nova     return self.object_type.invoke(self)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2017-03-29 17:46:28.176 32263 ERROR nova     **context.local_conf)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2017-03-29 17:46:28.176 32263 ERROR nova     val = callable(*args, **kw)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/auth.py", line 57, in pipeline_factory_v21
2017-03-29 17:46:28.176 32263 ERROR nova     return _load_pipeline(loader, local_conf[CONF.api.auth_strategy].split())
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/auth.py", line 38, in _load_pipeline
2017-03-29 17:46:28.176 32263 ERROR nova     app = loader.get_app(pipeline[-1])
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 350, in get_app
2017-03-29 17:46:28.176 32263 ERROR nova     name=name, global_conf=global_conf).create()
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 17:46:28.176 32263 ERROR nova     return self.object_type.invoke(self)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 146, in invoke
2017-03-29 17:46:28.176 32263 ERROR nova     return fix_call(context.object, context.global_conf, **context.local_conf)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2017-03-29 17:46:28.176 32263 ERROR nova     val = callable(*args, **kw)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/__init__.py", line 218, in factory
2017-03-29 17:46:28.176 32263 ERROR nova     return cls()
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/compute/__init__.py", line 31, in __init__
2017-03-29 17:46:28.176 32263 ERROR nova     super(APIRouterV21, self).__init__()
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/__init__.py", line 243, in __init__
2017-03-29 17:46:28.176 32263 ERROR nova     self._register_resources_check_inherits(mapper)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/__init__.py", line 259, in _register_resources_check_inherits
2017-03-29 17:46:28.176 32263 ERROR nova     for resource in ext.obj.get_resources():
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/compute/cloudpipe.py", line 187, in get_resources
2017-03-29 17:46:28.176 32263 ERROR nova     CloudpipeController())]
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/compute/cloudpipe.py", line 48, in __init__
2017-03-29 17:46:28.176 32263 ERROR nova     self.setup()
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/openstack/compute/cloudpipe.py", line 55, in setup
2017-03-29 17:46:28.176 32263 ERROR nova     fileutils.ensure_tree(CONF.crypto.keys_path)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/fileutils.py", line 40, in ensure_tree
2017-03-29 17:46:28.176 32263 ERROR nova     os.makedirs(path, mode)
2017-03-29 17:46:28.176 32263 ERROR nova   File "/usr/lib/python2.7/os.py", line 157, in makedirs
2017-03-29 17:46:28.176 32263 ERROR nova     mkdir(name, mode)
2017-03-29 17:46:28.176 32263 ERROR nova OSError: [Errno 13] Permission denied: '/usr/local/lib/python2.7/dist-packages/keys'

This was simple to fix and all I had to do was create the directory and set the owner to the service user. The second issue was my old friend the requirements mismatch:

2017-03-29 18:33:11.433 1155 ERROR nova Traceback (most recent call last): 
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/bin/nova-api", line 10, in <module>
2017-03-29 18:33:11.433 1155 ERROR nova     sys.exit(main())
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/cmd/api.py", line 59, in main
2017-03-29 18:33:11.433 1155 ERROR nova     server = service.WSGIService(api, use_ssl=should_use_ssl)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/service.py", line 311, in __init__
2017-03-29 18:33:11.433 1155 ERROR nova     self.app = self.loader.load_app(name)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/wsgi.py", line 497, in load_app
2017-03-29 18:33:11.433 1155 ERROR nova     return deploy.loadapp("config:%s" % self.config_path, name=name)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 247, in loadapp
2017-03-29 18:33:11.433 1155 ERROR nova     return loadobj(APP, uri, name=name, **kw)     
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 272, in loadobj
2017-03-29 18:33:11.433 1155 ERROR nova     return context.create()
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 18:33:11.433 1155 ERROR nova     return self.object_type.invoke(self) 
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 144, in invoke
2017-03-29 18:33:11.433 1155 ERROR nova     **context.local_conf)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2017-03-29 18:33:11.433 1155 ERROR nova     val = callable(*args, **kw)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/urlmap.py", line 31, in urlmap_factory
2017-03-29 18:33:11.433 1155 ERROR nova     app = loader.get_app(app_name, global_conf=global_conf)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 350, in get_app
2017-03-29 18:33:11.433 1155 ERROR nova     name=name, global_conf=global_conf).create()  
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 18:33:11.433 1155 ERROR nova     return self.object_type.invoke(self) 
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 203, in invoke
2017-03-29 18:33:11.433 1155 ERROR nova     app = context.app_context.create()
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 710, in create
2017-03-29 18:33:11.433 1155 ERROR nova     return self.object_type.invoke(self) 
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/loadwsgi.py", line 146, in invoke
2017-03-29 18:33:11.433 1155 ERROR nova     return fix_call(context.object, context.global_conf, **context.local_conf)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/paste/deploy/util.py", line 55, in fix_call
2017-03-29 18:33:11.433 1155 ERROR nova     val = callable(*args, **kw)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/wsgi.py", line 270, in factory
2017-03-29 18:33:11.433 1155 ERROR nova     return cls(**local_config)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/metadata/handler.py", line 49, in __init__
2017-03-29 18:33:11.433 1155 ERROR nova     expiration_time=CONF.api.metadata_cache_expiration)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/cache_utils.py", line 58, in get_client
2017-03-29 18:33:11.433 1155 ERROR nova     backend='oslo_cache.dict'))
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/cache_utils.py", line 96, in _get_custom_cache_region
2017-03-29 18:33:11.433 1155 ERROR nova     region.configure(backend, **region_params)    
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/dogpile/cache/region.py", line 413, in configure
2017-03-29 18:33:11.433 1155 ERROR nova     backend_cls = _backend_loader.load(backend)   
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/dogpile/util/langhelpers.py", line 40, in load
2017-03-29 18:33:11.433 1155 ERROR nova     return impl.load()
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2301, in load
2017-03-29 18:33:11.433 1155 ERROR nova     self.require(*args, **kwargs)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2324, in require
2017-03-29 18:33:11.433 1155 ERROR nova     items = working_set.resolve(reqs, env, installer, extras=self.extras)
2017-03-29 18:33:11.433 1155 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 859, in resolve
2017-03-29 18:33:11.433 1155 ERROR nova     raise VersionConflict(dist, req).with_context(dependent_req)
2017-03-29 18:33:11.433 1155 ERROR nova ContextualVersionConflict: (pbr 1.10.0 (/usr/local/lib/python2.7/dist-packages), Requirement.parse('pbr>=2.0.0'), set(['oslo.i18n', 'oslo.log', 'oslo.context', 'oslo.utils']))

In this instance it was a pretty base requirement, pbr,  that was at the wrong version. When I saw this I realized that I forgot to use constraints (because pbr is used by everything in OpenStack) and I quickly reran pip install for nova with the constraints argument to correct this issue.

The final thing I hit was a missing sudoers file:

2017-03-29 18:29:47.844 905 ERROR nova Traceback (most recent call last):
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/bin/nova-api", line 10, in <module>
2017-03-29 18:29:47.844 905 ERROR nova     sys.exit(main())
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/cmd/api.py", line 59, in main
2017-03-29 18:29:47.844 905 ERROR nova     server = service.WSGIService(api, use_ssl=should_use_ssl)
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/service.py", line 309, in __init__
2017-03-29 18:29:47.844 905 ERROR nova     self.manager = self._get_manager()
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/service.py", line 364, in _get_manager
2017-03-29 18:29:47.844 905 ERROR nova     return manager_class()
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/api/manager.py", line 30, in __init__
2017-03-29 18:29:47.844 905 ERROR nova     self.network_driver.metadata_accept()
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/network/linux_net.py", line 606, in metadata_accept
2017-03-29 18:29:47.844 905 ERROR nova     iptables_manager.apply()
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/network/linux_net.py", line 346, in apply
2017-03-29 18:29:47.844 905 ERROR nova     self._apply()
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2017-03-29 18:29:47.844 905 ERROR nova     return f(*args, **kwargs)
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/network/linux_net.py", line 366, in _apply
2017-03-29 18:29:47.844 905 ERROR nova     attempts=5)
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/network/linux_net.py", line 1167, in _execute
2017-03-29 18:29:47.844 905 ERROR nova     return utils.execute(*cmd, **kwargs)
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/utils.py", line 297, in execute
2017-03-29 18:29:47.844 905 ERROR nova     return RootwrapProcessHelper().execute(*cmd, **kwargs)
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/nova/utils.py", line 180, in execute
2017-03-29 18:29:47.844 905 ERROR nova     return processutils.execute(*cmd, **kwargs)
2017-03-29 18:29:47.844 905 ERROR nova   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 400, in execute
2017-03-29 18:29:47.844 905 ERROR nova     cmd=sanitized_cmd)
2017-03-29 18:29:47.844 905 ERROR nova ProcessExecutionError: Unexpected error while running command.
2017-03-29 18:29:47.844 905 ERROR nova Command: sudo nova-rootwrap /etc/nova/rootwrap.conf iptables-save -c
2017-03-29 18:29:47.844 905 ERROR nova Exit code: 1
2017-03-29 18:29:47.844 905 ERROR nova Stdout: u''
2017-03-29 18:29:47.844 905 ERROR nova Stderr: u'sudo: no tty present and no askpass program specified\n'

Nova needs root priveleges to perform some operations. To do this it leverages a program called rootwrap to do the privelege escalation. But it needs sudo to be able to leverage rootwrap. I was able to to fix this by creating a sudoers file for nova like:

nova ALL=(root) NOPASSWD: /usr/local/bin/nova-rootwrap /etc/nova/rootwrap.conf

After correcting those 3 issues I got Nova running without any errors (at least with the verification steps outlined in the install guide)

Installing Neutron

The last service I’m installing from the install guide (I skipped cinder because I’m not using block storage) is Neutron. By far this was the most complicated and most difficult service to install and configure. I had the most problems with neutron and networking in general both during the install phase and also later when I was debugging the operation of the cloud.  In the case of Neutron I started by reading the install guide section for neutron like the other services, but I also often needed to read the OpenStack Networking Guide to get a better grasp on the underlying concepts the install guide was trying to explain. Especially after getting to the section in the install guide where it asks you to pick between “Provider Networks” or “Self Service Networking”.

After reading all the documentation I decided that I wanted use provider networks because all I wanted was all my guests on a flat Layer 2 and for the guests  to come on my home network with an IP address I could reach from any of my other computer I have at home.  When I saw this diagram in the Networking Guide:

it made my decision simple. This networking topology was exactly what I wanted. I didn’t want to have to deal with creating a network, subnet, and router in neutron for each tenant to be able to access my guests. With this decision made I went about following the configuration guide lke for the previous services.

Unfortunately I hit an issue pretty early on. These were related to Neutron’s default configuration being spread across multiple files. It makes it very confusing to follow the install guide. For example, it says you want to write one set of config options into /etc/neutron/neutron.confthen a second set of config options into /etc/neutron/plugins/ml2/ml2_conf.ini and a third set of config options into /etc/neutron/plugins/ml2/linuxbridge_agent.ini, etc. This process continues for another 2 or 3 config files without any context on how these separate files are used. Then what makes it worse is when you actually go to launch the neutron daemons . Neutron itself consists of 4-5 different daemons running on the controller and compute nodes. But, there is no documentation anywhere on how all of these different config files are leveraged by the different daemons. For example, when launching linuxbridge-agent daemon which config files are you supposed to pass in? I ended up having to cheat  for this and look at the devstack soure code to see how it launched neutron there. After that I realized neutron is just leveraging oslo.config‘s ability to specify multiple config files and have them be concatenated together at runtime. This means that because there are no overlapping options that none of this complexity is required and a single neutron.conf could be used for everything. This is something I think we must change in Neutron, because as things are now are just too confusing.

After finally getting everything configured I encountered a number of other issues. The first was around rootwrap, just like nova, neutron need root privileges to perform some operations, and it leverages rootwrap to perform the privilege escalation.  However, neutron uses rootwrap as a separate daemon, and calls it over a socket interface. (this is done to reduce the overhead for creating a separate python process on each external call, which can slow things down significantly) When I first started neutron I hit a similar error to nova about sudo permissions. So I needed to create a sudoers file for neutron, in my case it looked like this:

neutron ALL=(root) NOPASSWD: /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf *
neutron ALL=(root) NOPASSWD: /usr/local/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

But it also turns out I needed to tell neutron how to call rootwrap. I found this bug on launchpad when I did a google search on my error and it told me about the config options I needed to set in addition to creating the sudoers file.  These weren’t in the install documentation as I expect by default the neutron distro packages set these config options.  After creating the sudoers file and setting the config flags I was able to get past this issue.

The next problem was also fairly cryptic. When I first started neutron after fixing the rootwrap issue I was greeted by this error in the logs:

2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent Traceback (most recent call last):
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 453, in daemon_loop
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     sync = self.process_network_devices(device_info)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/osprofiler/profiler.py", line 153, in wrapper
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     return f(*args, **kwargs)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/agent/_common_agent.py", line 203, in process_network_devices
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     device_info.get('updated'))
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 277, in setup_port_filters
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     self.prepare_devices_filter(new_devices)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 131, in decorated_function
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     *args, **kwargs)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 139, in prepare_devices_filter
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     self._apply_port_filter(device_ids)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 157, in _apply_port_filter
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     security_groups, security_group_member_ips)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py", line 173, in _update_security_group_info
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     remote_sg_id, member_ips)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 163, in update_security_group_members
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     self._update_ipset_members(sg_id, sg_members)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/iptables_firewall.py", line 169, in _update_ipset_members
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     sg_id, ip_version, current_ips)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/ipset_manager.py", line 83, in set_members
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     self.set_members_mutate(set_name, ethertype, member_ips)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     return f(*args, **kwargs)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/ipset_manager.py", line 93, in set_members_mutate
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     self._create_set(set_name, ethertype)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/ipset_manager.py", line 139, in _create_set
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     self._apply(cmd)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/ipset_manager.py", line 149, in _apply
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     check_exit_code=fail_on_errors)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 128, in execute
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     execute_rootwrap_daemon(cmd, process_input, addl_env))
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 115, in execute_rootwrap_daemon
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     return client.execute(cmd, process_input)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/local/lib/python2.7/dist-packages/oslo_rootwrap/client.py", line 129, in execute
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     res = proxy.run_one_command(cmd, stdin)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "<string>", line 2, in run_one_command
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent   File "/usr/lib/python2.7/multiprocessing/managers.py", line 774, in _callmethod
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent     raise convert_to_error(kind, result)
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent RemoteError:
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent ---------------------------------------------------------------------------
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent Unserializable message: ('#ERROR', ValueError('I/O operation on closed file',))
2017-03-30 11:57:05.182 4158 ERROR neutron.plugins.ml2.drivers.agent._common_agent ---------------------------------------------------------------------------

Which isn’t helpful at all. It turns out that this error means that neutron can’t find the ipset command, but it’s not at all clear from the traceback. I was only able to figure this out after tracing through the neutron source code (by following the calls in the traceback) I realized that this error is being emitted after neutron calls the rootwrap daemon. I had to turn debug log level on in the separate rootwrap.conf (which is something packaged in the tarball) to get the rootwrap daemon to log the error message it’s encountering, which in this case was that ipset could not be found. After installing ipset this was corrected.

After all of these headaches I finally got neutron running. But, I quickly found that my choice for provider networks was causing issues with DHCP on my home network. I only have a single 24 port unmanaged switch at home and the bridge interfaces for the guests were on the same Layer 2 network as the rest of my home infrastructure, including my DHCP server. This meant that when I created a server in the cloud the DHCP request from the guest would go out and be recieved by both the neutron DHCP agent as well as my home DHCP server because being on the same Layer 2 meant they shared a broadcast domain. Luckily neutron’s default security group rules blocked the DHCP response from my home server, but there was still a lease record being created on my home server, and also if I ever loosened the security group rules and DHCP traffic was allowed then there would be a race condition between my server and the neutron agent. It turns out there was a small note (see step 3) on this potential problem in the networking guide. So my solution for this was to disable DHCP in neutron and also stop running the DHCP agent on my cloud. This had a ripple effect in that I couldn’t use the metadata service either because it depends on DHCP to set the route for the hardcoded ip address for the metadata server. (this will come up later) Luckily I was able to leverage the force_config_drive option in Nova to make sure the metadata service wasn’t necessary.

I modified the network diagram above for what I ended up with in my cloud:

(note I’m terrible at art, so you can clearly tell where I made changes)

If all of the above didn’t make it clear I still find Neutron the roughest part of the user experience for OpenStack. Besides complexity in configuration it also has a presumption of a decent understanding of networking concepts. I fully admit networking is hard, especially for clouds because you’re dealing with a lot of different pieces, but this is somwhere I feel we need to make improvements. Especially in my use case where my requirements were pretty straightforward. I just wanted to have a server come up on my home network when it was booted so I could log into it right after it booted.  In my opinion this is what the majority of cloud consumers (people using the API) care about. Just getting an IP address (v4 or v6 it doesn’t really matter) and being able to connect to that from their personal machines. After going through this process I’m pretty sure that my college student self who had a much more limited understanding of networking than I do now would have had a very difficult time figuring this out.

Booting the first server

After getting everything running on a single node it was time to boot my first server. I eagerly typed in the

openstack server create

command with all the parameters for my credentials the flavor and the image I had uploaded and waited for the server to go ACTIVE state by running:

openstack server list

a few times. Once the server went into the ACTIVE state I tried to login into the guest with ssh, and got nothing. The ssh connection just timed out and there wasn’t any indication why. Having debugged a ton of issues like this over the years my first guess was ok I screwed up the networking, let me look at the console log by running:

openstack console log show test-server

and it returned nothing.  I was a bit lost as to why, the console log should show the process of booting the operating system. I figured that I made a configuration mistake in the nova, so to double check I logged into the compute node and checked the libvirt state directory and confirmed that the console log file was empty. But this left me at an impasse, why would the guest not be logging anything to the console on boot? So I just started sanity checking everything I could find. When I looked at Nova’s local image cache and saw the cirros image was 0 bytes in size. A cirros image should be about 13MB in size, so 0 bytes was clearly wrong. From there I started tracing through the glance logs to figure out where the data was getting lost (was it nova downloading the image from glance, or did glance have an empty image) when I found:

DEBUG glance_store._drivers.filesystem [req-3163a1a7-4ca9-47e8-9444-cd8b865055fb 20f283024ffd4bf4841a8d33bdb4f385 6c3fc6392e0c487e85d57afe5a5ab2b7 - default default] Wrote 0 bytes to /var/lib/glance/images/e6735636-43d9-4fb0-a302-f3710386b689 with checksum d41d8cd98f00b204e9800998ecf8427e add /usr/local/lib/python2.7/dist-packages/glance_store/_drivers/filesystem.py:706

Which was the only hint I could find in the glance logs. It wasn’t even that useful all it said was that glance wrote 0 bytes to disk for the uploaded image. Which at least confirmed that glance wasn’t storing any data from the image upload. But, I couldn’t find any other information about this. So I decided to re-upload the image to glance and use tcpdump on both my desktop and the server to make sure the data was getting sent over the wire to glance. The output of the tcpdump showed all the data being sent and received. This at least meant that the data is getting to the glance api server, but it didn’t really help me figure out where the data was going.

With no other ideas I decided to “instrument” the glance code by manually adding a bunch of log statements to the installed python code in

/usr/local/lib/python2.7/site-packages/glance

by hand to the trace the data flow through the glance code to find where the image goes from 13MB to 0 bytes. When I did this I was able to figure out that the image data was being lost outside of the glance code in one of it’s requirement libraries either webob, paste, or something like that. When I saw that I realized that I forgot to use constraints when installing glance. I quickly rushed to reinstall glance from the tarball using the constraints parameter and restarted the service. After doing this and re-uploading the image everything worked!

My only mistake in that process was in my over-eagerness to fix the problem I forgot to take notes of exactly what I reinstalled to see where the actual problem was. So all I can say for sure is that make sure you use constraints whenever you install from source, because clearly there was an issue with just using pip install by itself.

After getting glance working I was able to re-run the openstack command to create a server and this time I was able to get a console log, but ssh still didn’t work.

Networking Woes

At this point I had the servers booting, but I wasn’t able to login to them. I’ve personally had to debug this kind of issues many times, so when I saw this my first step was to ping the IP address for the guest, just to rule out that it was an issue with the ssh daemon on the server. Since the ping didn’t work I wanted to see if there were was an entry in my arp table for the ip address. Again, there was nothing on that IP after running the arp command. So this either meant there was an issue with Layer 2 connectivity to the guest from my desktop, or the guest didn’t know it’s IP address. (I’ve personally seen both failure conditions) My next step was to check the console log to see if it was setting an IP address correctly. When I got to the cloud-init section of the console log it showed that the IP address was never getting assigned. Instead the server was timing out waiting for a DHCP lease. If you remember the neutron section above I had to disable DHCP on the guests because it was conflicting with my home’s DHCP server so this clearly wasn’t right.

It turns out that cloud-init doesn’t know how to deal with static networking configuration from a config drive. (it might work with a metadata server, but I was not able to check this) So when the guest boots it just ignores the static networking information in the config drive and then tries to get a DHCP lease. This meant that cirros, the recommended image for testing and what the install guide tells you to use, wasn’t going to work. Also the majority of cloud images you can download weren’t going to work either. The only cloud image I was able to get working was the official ubuntu cloud image. This was because Nova was doing file injection to write a the networking information directly into the guest file system. I found a useful blog post on this in my searching: http://blog.oddbit.com/2015/06/26/openstack-networking-without-dhcp/ (although the translation didn’t work on RHEL like that post indicates) But, even if I got ubuntu to work, having a cloud that was only able to boot a single type of image isn’t really that useful.

Luckily the OpenStack Infrastructure team has a similar problem on some public OpenStack clouds they run things on, and they created the Glean project to be an alternative for cloud-init that can properly use the static networking information from a config drive. All I had to do was leverage the Disk Image Builder project to create the images I uploaded into my cloud with glean instead of cloud-init. While not ideal solution, because you can’t take anyone’s random pre-existing cloud image, this worked well enough for me because I can remember to do this as the primary user of my cloud.

It’s also  worth pointing out that all of these networking issues would have been completely avoided if I chose self service networking back in the setting up neutron section. (because it creates a separate Layer 2 network for each tenant) But, given my goals with the cloud and the way the documentation lays out the options I had no way to know this. This connects back to my earlier complaints with neutron being too complex and presuming too much prior knowledge.

But, at this point I had a working single node cloud and could successfully boot guests. All that was left before I finished the cloud deployment was to replicate the installation on the remaining 4 servers.

Setting Up the Compute Nodes

Once I confirmed to have a working configuration and got all the services figured out on the controller node (which included nova-compute and the necessary neutron services for a compute node because it was an all in one) and got everything running there, it was time to setup the compute nodes. This was pretty straightforward and just involved configuring nova-compute and neutron services. It was pretty formulaic and basically just copy and paste. The exact procedure that I wrote down in my notes for this process was:

  1. add provider network interface config
  2. disable apparmor
  3. reboot
  4. download tarballs
  5. create system users
  6. add nova user to libvirt group
  7. install all binaries (libvirt, qemu, ipset, mkisofs, libssl-dev, pip)
  8. make service dirs /etc/ /var/lib for both neutron and nova
  9. copy etc dirs from tarballs to /etc
  10. pip install code with upper-constraints
  11. write config files (basically just copy from another compute node)
  12. set permissions on /etc and /var/lib
  13. create sudoers files for nova and neutron
  14. create systemd unit files
  15. start services
  16. run nova discover_hosts

This is basically just copying and pasting things across the remaining 4 servers. But there were a couple of lessons I learned from the initial install were reflected in these. The only one I haven’t talked about before was disabling apparmor. (or SELinux on other linux distros) I learned the hard way that the default apparmor rules on Ubuntu prevent nova and libvirt from doing the necessary operations to boot a guest. The proper way to fix this issue (especially for better security) would be to create your own apparmor rules to allow the operations being blocked. But, I have always been confused by this, especially on SELinux and didn’t even bother trying. I just disabled apparmor and moved on.

After repeating these steps across the 4 compute nodes I had a fully operational cloud. Nova was showing me the full capacity of 80 vCPUs and I could interact with the cloud and launch guests across all of them. My project was complete! (at least for the first phase)

Conclusion

So after writing all of this down I came to the realization that I likely give the impression that installing OpenStack by hand is an impossibly complex task. But, honestly it wasn’t that bad of an experience. Sure, OpenStack is complex software with a lot of moving pieces, but in total I got everything working in 2-3 days.  (and I wasn’t dedicating all my time during those days either) The majority of the issues that I hit were caused solely by my insistence on installing everything from tarballs. If I actually followed my original thought experiment and just followed the install guide the only issue I probably would have hit was with networking. Once you understand what OpenStack is doing under the covers the install is pretty straightforward. After doing my first OpenStack install a few years ago I found I had a better understanding of how OpenStack works which really helped me in my work on the project. It’s something I recommend that everyone does at least once if they’re planning on working on OpenStack in any capacity. Even just in a VM for playing around. (devstack doesn’t count)

For comparison that rack of similar Dell servers I deployed back in college took me so much longer to get running. In that case I used xCAT for deployment automation. But, it still took me over a month to get the inifiband cards working with RDMA using OFED, setting up SLURM for MPI job scheduling, connecting everything to our central LDAP server, and having users able to launch jobs across all the nodes. While, it’s not entirely a fair comparison since I have almost a decade more of experience now, but I think it helps put into perspective that this is far from the most grueling experience I’ve had installing software.

After going through the whole exercise I don’t actually run this cloud 24/7, mostly because it heats up my apartment too much and I can’t sleep at night when it’s running.  The power consumption for the servers is also pretty high and I don’t really want to pay the power bill. This basically means I failed the second half of the experiment, to virtualize my home infrastructure. Since I can’t rely on the cloud for critical infrastructure if it’s not always running. But I have found some uses for the cloud both for development tasks as well as running some highly parallel CPU tasks across the entire cloud at once.

Moving forward I intend to continue working on the cloud and upgrading it to future releases as they occur. Also one of my goals for this entire exercise was going back to the OpenStack community with feedback on how to improve things and submitting patches and/or bugs on fixing some of the issues. This will be an ongoing process as I find time to work on them and also encounter more issues.

by Matthew Treinish at July 20, 2017 03:32 PM

OpenStack Superuser

Two ways to contribute to OpenStack to make a big impact

At the previous Summit, I met with various companies that were both new and community veterans. Both types asked a very common question that just about everyone in the community has: Where should I contribute in OpenStack to make an impact?

The OpenStack Technical Committee (TC) met soon after and came up with the Top 5 help most wanted list.

Companies looking for excellent ways to contribute upstream should give these two priorities (so far!) a good look:

Documentation owners

Complexity in deployments for new adopters can happen. It’s essential that we have good documentation to support our operators and users in our community. Future code contributors coming to OpenStack will also appreciate being able to use the software that they helped make great.

Without a doubt an important part of our community, the documentation team has been struggling with resources since the dawn of OpenStack. The community is now moving forward on decentralizing documentation for OpenStack services — the idea is that each service owns its documentation.

Interested? Contact the Documentation PTL Alex Settle (asettle) or the TC sponsor for this item Doug Hellman (dhellmann) on Freenode IRC. (If you’re new to IRC, read this first.)

Glance contributors

Glance, an early OpenStack project, is deployed in almost every OpenStack cloud since it’s needed by Nova to boot instance for managed images.

The Glance team is struggling to find new contributors to tackle its technical debt. It’s a great project to get started in OpenStack — they’ve been a welcoming project to interns, junior developers and senior developers alike.

Interested? Join the Glance IRC channel (#openstack-glance) or reach out to the Glance PTL Brian Rosmaita (rosmaita) or the TC sponsor for this item Flavio Percoco (flaper87) on Freenode IRC, or start a new email thread on the OpenStack dev mailing list using the tag [glance].

What else?

While the TC continues to evaluate suggested priorities in the future for inclusion, there are a few other ways you can help out upstream now:

  • Support teams beyond Docs (Infra, QA, release management, stable maintenance…)
  • Other small teams hungry for resources beyond Glance (Keystone, Designate, Horizon…)
  • Inter-project work (drive features across multiple projects, champion release goals, participate in working groups like API Working Group or Stewardship Working Group.)

 

Mike Perez is the cross-project developer coordinator at the OpenStack Foundation. You can find him as as thingee on IRC and Twitter.

The post Two ways to contribute to OpenStack to make a big impact appeared first on OpenStack Superuser.

by Mike Perez at July 20, 2017 11:42 AM

July 19, 2017

NFVPE @ Red Hat

yakLab Part 1b: Kickstart File Build Out

In scene 1b, we’ll continue with our work from the Building the virtual Cobbler deployment and get a kickstart file loaded into Cobbler. I’m going to be mostly reviewing the kickstart file itself, and not really getting into how to manage the Cobbler process itself (that’s left as an exercise for the reader).

by Leif Madsen at July 19, 2017 05:54 PM

Chris Dent

TC Report 29

This TC Report is a bit late. Yesterday I was attacked by an oyster.

This week had no meeting, so what follows is a summary of various other TC related (sometimes only vaguely related) activity.

Vision

The TC Vision has been merged, presented in a way that makes sure that it's easy and desirable to create a new vision at a later date to respond to changing circumstances. There were concerns during the review process that the document as is does not take into account recent changes in the corporate and contributor community surrounding OpenStack. The consensus conclusion, however, was that the goals stated in the vision remain relevant and productive work has already begun.

Hosted Projects

The conversation about hosted projects continues, mostly in regard to that question with great stamina: Is OpenStack Infrastructure as a Service or something more encompassing of all of cloud? In either case what does it take for something to be a complete IAAS or what is "cloud"? There was a useful posting from Zane pointing out that the varied assumptions people bring to the discussion are a) varied, b) assumptions.

It feels likely that these discussion will become more fraught during times of pressure but have no easy answers. As long as the discussion don't devolve into name calling, I think each repeat round is useful as it brings new insights to the old hands and keeps the new hands informed of stuff that matters. Curtailing the discussion simply because we have been over it before is disrespectful to the people who continue to care and to the people for whom it is new.

I still think we haven't fully expressed the answers to the questions about the value and cost that any project being officially in OpenStack has for that project or for OpenStack. I'm not asserting anything about the values or the costs; knowing the answers is simply necessary to have a valid conversation.

Glare

The conversation about Glare becoming official continued, but more slowly than before. The plan at this stage is to discuss the issues in person at the PTG where the Glare project will have some space. ttx made a brief summary; there's no objection to Glare becoming official unless there is some reason to believe it will result in issues for Glance (which is by no means pre-determined).

SIGs

The new openstack-sigs mailing list was opened with a deliberately provocative thread on How SIG Work Gets Done. This resulted in comments on how OpenStack work gets done, how open source work gets done, and even whether open source behaviors fully apply in the OpenStack context.

by Chris Dent at July 19, 2017 12:15 PM

OpenStack Superuser

Running an OpenStack cloud? Go to the next Operators Meetup

If you run an OpenStack cloud, attending the next Ops Meetup is a great way to swap best practices and share war stories.

This time around, it’ll be held in Mexico City’s Digital Culture Center, August 9-10. (Previous editions focused on providing input for upcoming releases, however with the addition of the Forum, ops folks are invited to collaborate with OpenStack upstream developers to share feedback and shape upcoming releases at the Summit.)

You still have time to influence the sessions – so check out the Etherpad, where you’ll also find hotel info. Suggestions so far include containers, an Ironic workshop and upgrade challenges. Tickets, limited to 150 participants, cost $20.

Got questions? Reach out to Tom Fifield at tomATopenstack.org or the OpenStack Ops mailing list.

Cover Photo // CC BY NC

The post Running an OpenStack cloud? Go to the next Operators Meetup appeared first on OpenStack Superuser.

by Superuser at July 19, 2017 12:11 PM

Giulio Fidente

Understanding ceph-ansible in TripleO

One of the goals for the TripleO Pike release was to introduce ceph-ansible as an alternative to puppet-ceph for the deployment of Ceph.

More specifically, to put operators in control of the playbook execution as if they were launching ceph-ansible from the commandline, except it would be Heat starting ceph-ansible at the right time during the overcloud deployment.

This demanded for some changes in different tools used by TripleO and went through a pretty long review process, eventually putting in place some useful bits for the future integration of Kubernetes and migration to an ansible driven deployment of the overcloud configuration steps in TripleO.

The idea was to add a generic functionality allowing triggering of a given Mistral workflow during the deployment of a service. Mistral could have then executed any action, including for example an ansible playbook, provided it was given all the necessay input data for the playbook to run and the roles list to build the hosts inventory.

This is how we did it.

Run ansible-playbook from Mistral (1)
An initial submission added support for the execution of ansible playbooks as workflow tasks in Mistral https://github.com/openstack/tripleo-common/commit/e6c8a46f00436edfa5de92e97c3a390d90c3ce54

A generic action for Mistral which workflows can use to run an ansible playbook. +2 to Dougal and Ryan.

Deploy external resources from Heat (2)
We also needed a new resource in Heat to be able to drive Mistral workflow executions https://github.com/openstack/heat/commit/725b404468bdd2c1bdbaf16e594515475da7bace so that we could orchestrate the executions like any other Heat resource. This is described much in detail in a Heat spec.

With these two, we could run an ansible playbook from a Heat resource, via Mistral. +2 to Zane and Thomas for the help! Enough to start messing in TripleO and glue things together.

Describe what/when to run in TripleO (3)
We added a mechanim in the TripleO templates to make it possible to describe, from within a service, a list of tasks or workflows to be executed at any given deployment step https://github.com/openstack/tripleo-heat-templates/commit/71f13388161cbab12fe284f7b251ca8d36f7635c

There aren't restrictions on what the tasks or workflows in the new section should do. These might deploy the service or prepare the environment for it or execute code (eg. build Swift rings). The commit message explains how to use it:

service_workflow_tasks:
  step2:
    - name: my_action_name
      action: std.echo
      input:
        output: 'hello world'

The above snippet would make TripleO to run the Mistral std.echo action during the overcloud deployment, precisely at step 2, assuming you create a new service with the code above and enable it on a role.

For Ceph we wanted to run the new Mistral action (1) and needed to provide it with the config settings for the service, normally described within the config_settings structure of the service template.

Provide config_settings to the workflows (4)
The decision was to make available all config settings into the Mistral execution environment so that ansible actions could, for example, use them as extra_vars https://github.com/openstack/tripleo-heat-templates/commit/8b81b363fd48b0080b963fd2b1ab6bfe97b0c204

Now all config settings normally consumed by puppet were available to the Mistral action and playbook settings could be added too, +2 Steven.

Build the data for the hosts inventory (5)
Together with the above, another small change provided into the execution environment a dictionary mapping every enabled service to the list of IP address of the nodes where the service is deployed https://github.com/openstack/tripleo-heat-templates/commit/9c1940e461867f2ce986a81fa313d7995592f0c5

This was necessary to be able to build the ansible hosts inventory.

Create a workflow for ceph-ansible (6)
Having all pieces available to trigger the workflow and pass to it the service config settings, we needed the workflow which would run ceph-ansible https://github.com/openstack/tripleo-common/commit/fa0b9f52080580b7408dc6f5f2da6fc1dc07d500 plus some new, generic Mistral actions, to run smoothly multiple times (eg. stack updates) https://github.com/openstack/tripleo-common/commit/f81372d85a0a92de455eeaa93162faf09be670cf

This is the glue which runs a ceph-ansible playbook with the given set of parameters. +2 John.

Deploy Ceph via ceph-ansible (7)
Finally, the new services definition for Tripleo https://review.openstack.org/#/c/465066/ to deploy Ceph in containers via ceph-ansible, including a couple of params operators can use to push into the Mistral environment arbitrary extra_vars for ceph-ansible.

The deployment with ceph-ansible is activated with the ceph-ansible.yaml environment file.

Interestingly the templates to deploy Ceph using puppet-ceph are unchanged and continue to work as they used to so that for new deployments it is possible to use alternatively the new implementation with ceph-ansible or the pre-existing implementation using puppet-ceph. Only ceph-ansible allows for the deployment of Ceph in containers.

Big +2 also to Jiri (who doesn't even need a blog or twitter) and all the people who helped during the development process with feedback, commits and reviews.

Soon another article with some usage examples and debugging instructions!

by Giulio Fidente at July 19, 2017 09:00 AM

July 18, 2017

Red Hat Stack

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 3

In Part 1 of this series Federico Iezzi, EMEA Cloud Architect with Red Hat covered the architecture and planning requirements to begin the journey into achieving zero packet loss in Red Hat OpenStack Platform 10 for NFV deployments. In Part 2 he went into the details around the specific tuning and parameters required. Now, in Part 3, Federico concludes the series with an example of how all this planning and tuning comes together!

opwithtoolsinside

Putting it all together …

So, what happens when you use the cpu tuning features?

Well, it depends on the hardware choice of course. But to see some examples we can use Linux perf events to see what is going on. Let’s look at two examples.

Virtual Machine

On a KVM VM, you will have the ideal results because you don’t have all of the interrupts from the real hardware:

$ perf record -g -C 1 -- sleep 2h
$ perf report --stdio -n
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 100  of event 'cpu-clock'
# Event count (approx.): 25000000
#
# Children      Self  Command  Shared Object      Symbol               
# ........  ........  .......  .................  .....................
#
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] default_idle
            |
            ---default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] arch_cpu_idle
            |
            ---arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] cpu_startup_entry
            |
            ---cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%   100.00%  swapper  [kernel.kallsyms]  [k] native_safe_halt
            |
            ---start_secondary
               cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt
   100.00%     0.00%  swapper  [kernel.kallsyms]  [k] start_secondary
            |
            ---start_secondary
               cpu_startup_entry
               arch_cpu_idle
               default_idle
               native_safe_halt

Physical Machine

On physical hardware, it’s quite different. The best results involved backlighting a bunch of ipmi and watchdog kernel modules:

$ modprobe -r iTCO_wdt iTCO_vendor_support
$ modprobe -r i2c_i801
$ modprobe -r ipmi_si ipmi_ssif ipmi_msghandler

Note: If you have a different watchdog than the example above, (iTCO is for Supermicro motherboards), check out the kernel modules folder where you can find the whole list: /lib/modules/*/kernel/drivers/watchdog/

Here’s the perf command and output for physical:

$ perf record -F 99 -g -C 2 -- sleep 2h
$ perf report --stdio -n
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 4  of event 'cycles:ppp'
# Event count (approx.): 255373
#
# Children      Self       Samples  Command  Shared Object      Symbol                                        
# ........  ........  ............  .......  .................  ..............................................
#
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] generic_smp_call_function_single_interrupt
            |
            ---generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] smp_call_function_single_interrupt
            |
            ---smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] call_function_single_interrupt
            |
            ---call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] cpuidle_idle_call
            |
            ---cpuidle_idle_call
               call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore
    99.83%     0.00%             0  swapper  [kernel.kallsyms]  [k] arch_cpu_idle
            |
            ---arch_cpu_idle
               cpuidle_idle_call
               call_function_single_interrupt
               smp_call_function_single_interrupt
               generic_smp_call_function_single_interrupt
               |          
                --99.83%--nmi_restore

Using mpstat, and excluding the hardware interrupts, the results are as follows:

Please note: one CPU core per socket has been excluded – in this case using two Xeon E5-2640 V4.

$ mpstat -P 1,2,3,4,5,6,7,8,9,11,12,13,14,15,16,17,18,19,21,22,23,24,25,26,27,28,29,31,32,32,34,35,36,37,38,39 3600
Linux 3.10.0-514.16.1.el7.x86_64 (ws1.localdomain)      04/20/2017      _x86_64_     (40 CPU)

03:05:10 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:       1    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       2    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       5    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       7    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       8    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:       9    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      11    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      12    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      13    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      15    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      19    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      21    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      22    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      23    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      24    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      25    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      26    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      27    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      28    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      29    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      32    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      34    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      35    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      36    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      37    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      38    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      39    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00

Cool, right? Want to know more about Linux perf events? Check out the following links:

Zero Packet Loss, achieved …

As you can see, using tuned and the cpu-partitioning profile is exceptional in that it exposes a lot of deep Linux tuning which usually only a few people know about.

IMG_0786

And with a combination of tuning, service settings, and plenty of  interrupt isolations (over 50% of the total settings are about interrupt isolations!) things really start to fly.

Finally, once you make sure PMD threads and VNF vCPUs do not get interrupted by other threads allowing for proper CPU core allocation, the zero packet loss game is achieved.

Of course, there are other considerations such as the hardware chosen, the VNF quality, and the number of PMD threads, but, generally speaking, those are the main requirements.

Further Reading …

Red Hat Enterprise Linux Performance Tuning Guide
Network Functions Virtualization Configuration Guide (Red Hat OpenStack Platform 10)


Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

by m4r1k at July 18, 2017 10:46 PM

NFVPE @ Red Hat

BYOB – Bring your own boxen to an OpenShift Origin lab!

Let’s spin up a OpenShift Origin lab today, we’ll be using openshift-ansible with a “BYO” (bring your own) inventory. Or I’d rather say “BYOB” for “Bring your own boxen”. OpenShift Origin is the upstream OpenShift – in short, OpenShift is a PaaS (platform-as-a-service), but one that is built with a distribution of Kubernetes, and in my opinion – is so valuable because of its strong opinions, which guide you towards some best practices for using Kubernetes for the enterprise. In addition, we’ll use my openshift-ansible-bootstrap which we can use to A. spin up some VMs to use in the lab, and/or B. Setup some basics on the host to make sure we can properly install OpenShift Origin. Our goal today will be to setup an OpenShift Origin cluster with a master and two compute nodes, we’ll verify that it’s healthy – and we’ll deploy a very basic pod.

by Doug Smith at July 18, 2017 07:00 PM

OpenStack Superuser

Looking ahead to OpenStack Days China

OpenStack Days China will take place in Bejiing July 24-25. This year’s event, expected to set attendance records, features speakers including OpenStack Foundation’s Jonathan Bryce, Alan Clark and Lauren Sell and is sponsored by companies including Huawei, EasyStack and Intel.

OpenStack is very hot in China right now — almost all of the major IT vendors and telcos in China are part of the OpenStack community: China Mobile, China Telecom, Huawei, Inspur, 99Cloud, EasyStack, UnitedStack and ZTE are currently Gold members of OpenStack and many more Chinese companies are corporate sponsors of OpenStack. Last year, the first OpenStack Day China event drew 2,400 attendees.

What are the expectations for this year’s event?

“We have a lot more support from the Foundation and technical leaders including Foundation CxOs, VPs, TCs, UCs, and PTLs,” Huawei’s Anni Lai tells Superuser. “I think the content will be a lot more technical and interesting. I look forward to hearing more interesting real-life use cases.” In terms of future outlook, Lai says that because so many telcos will be have OpenStack running on their public and private clouds she expects that “their deployments will be huge, in the area of thousands of physical nodes.”

To get ready, check out this session from the recent Boston Summit. A panel put together by Lai features Shane Wang, individual board director and engineering manager at Intel, 99Cloud’s Huang Shuquan, Shi Kui from EasyStack and Yangly from the China Electronics Standarization Institute (CESI.)

“These are the heroes of the Chinese OpenStack community, they’ve done tremendous work,” says Lai. “They’ve enabled a lot of large operators in China…” She outlined the recent highlights and events in the community adding that “we need to mobilize all of these developers in China, help them understand what’s going on in the community and see where the gaps are, so we can all contribute together. What’s really lacking is tighter integration with the rest of the OpenStack community.”

A few quotes from the session:

“In the last year, we helped build an ecosystem of cloud computing in China,” says Yangly. After ISO published the international standard for cloud reference architecture, we created a solution based on OpenStack. “I think it’s the first time in the world and we’ll keep promoting it.”

“My biggest challenge as a developer is the time zone,” says Wang. “It’s hard for developers to stay up late enough to join the community meetings.” Language is also an issue, he says. Chinese developers are very eager to learn new technology and contribute but it’s hard to them to speak out, so they talk less  – even though they develop more. He also says that Chinese customers have a hard time using software in English, so they’ll do a lot of customization. These features, however, are often hard to upstream. This is a bigger challenge, he adds, because they’ve made a lot of effort to maintain those features locally.

Check out the full 38-minute session below.

Cover Photo // CC BY NC

The post Looking ahead to OpenStack Days China appeared first on OpenStack Superuser.

by Superuser at July 18, 2017 11:55 AM

Opensource.com

Lessons in OpenStack: New tutorials and how-tos

Are you a developer or system administrator who wants to learn more about OpenStack?

by Jason Baker at July 18, 2017 07:03 AM

July 17, 2017

RDO

Emilien Machi: TripleO, and the Technical Committee - OpenStack PTG

Another video from the Pike Project Teams Gathering in Atlanta - here's Emilien talking about the work on the TripleO project, and about what's invovled in being a member of the OpenStack Technical Committee (TC).

by Rich Bowen at July 17, 2017 05:33 PM

Recent blog posts, July 17

Here's what the RDO community has been blogging about in the last few weeks:

Create a TripleO snapshot before breaking it… by Carlos Camacho

The idea of this post is to show how developers can save some time creating snapshots of their development environments for not deploying it each time it breaks.

Read more at http://anstack.github.io/blog/2017/07/14/snapshots-for-your-tripleo-vms.html

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 1 by m4r1k

For Telcos considering OpenStack, one of the major areas of focus can be around network performance. While the performance discussion may often begin with talk of throughput numbers expressed in Million-packets-per-second (Mpps) values across Gigabit-per-second (Gbps) hardware, it really is only the tip of the performance iceberg.

Read more at http://redhatstackblog.redhat.com/2017/07/11/tuning-for-zero-packet-loss-in-red-hat-openstack-platform-part-1/

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 2 by m4r1k

Ready for more Fast Packets?!

Read more at http://redhatstackblog.redhat.com/2017/07/13/tuning-for-zero-packet-loss-in-red-hat-openstack-platform-part-2/

TripleO Deep Dive: Internationalisation in the UI by jpichon

Yesterday, as part of the TripleO Deep Dives series I gave a short introduction to internationalisation in TripleO UI: the technical aspects of it, as well as a quick overview of how we work with the I18n team. You can catch the recording on BlueJeans or YouTube, and below's a transcript.

Read more at http://www.jpichon.net/blog/2017/07/tripleo-deep-dive-internationalisation-ui/

by Rich Bowen at July 17, 2017 04:44 PM

OpenStack Superuser

OpenStack Korea: Embracing the cloud universe

What’s next for OpenStack? New use cases like edge computing, multi-cloud and deeper collaboration with open source technologies are on the horizon. And it’s clear the Korean OpenStack community is leading the way at their latest event themed “Embracing the Cloud Universe.”

Jaesuk Ahn and Seungkyu Ahn kicked off OpenStack Days Korea with a live demo asking Amazon’s Alexa AI to deploy OpenStack in three minutes using containers. Jaesuk and Seungku’s team at SK telecom embody the collaborative, cross-community spirit by actively contributing to the openstack-helm project alongside AT&T and LCOO members. Their goal is to make deploying, managing and upgrading OpenStack services as easy as possible using Kubernetes Helm. The demo also relied on an Alexa agent the team created called Taco (“the ultimate container,” of course) and weavescope to visualize the services as they came online.

Sessions at OpenStack Days Korea focused on integrating and operating open infrastructure technologies–including official OpenStack projects, as well as related projects like Ceph or Kubernetes–which reflects the trajectory of the OpenStack Summits and community.

In addition to SK telecom, several users took the stage to share their stories:

  • The Korean Advanced Institute of Science & Technology (KAIST) discussed GPU-based, serverless architecture using the emerging Picasso project
  • Naver, a massive search and web portal akin to Google in Korea, talked about unifying their infrastructure platform based on OpenStack
  • Popular social messaging platform Kakao talked about running OpenStack Trove in production
  • Samsung SDS talked about NFV and network performance enhancement with hybrid cloud
  • NetMarble, a big gaming company popular for Lineage, talked about their use of Ceph and OpenStack
OpenStack Days Korea organizers and staff.

The community in Korea has provided a great model for global events and collaboration. Many thanks to the key organizers Jeasuk Ahn, Ian Choi, Jungwon Ku, Nalee Jang, Seungkyu Ahn, and Myonghwan Yoo, as well as the users who shared their knowledge.

Find out more about the OpenStack User Group and get involved here.

Cover Photo // CC BY NC

The post OpenStack Korea: Embracing the cloud universe appeared first on OpenStack Superuser.

by Lauren Sell at July 17, 2017 01:12 AM

July 14, 2017

StackHPC Team Blog

HPC Networking in OpenStack: Part 2

This post is the second in a series on HPC networking in OpenStack. In the series we'll discuss StackHPC's current and future work on integrating OpenStack with high performance network technologies. This post discusses how the Kayobe project uses Ansible to define physical and virtual network infrastructure as code.

If you've not read it yet, why not begin with the first post in this series.

The Network as Code

Operating a network has for a long time been a difficult and high risk task. Networks can be fragile, and the consequences of incorrect configuration can be far reaching. Automation using scripts allows us to improve on this, reaping some of the benefits of established software development practices such as version control.

Ansible

The recent influx of DevOps and configuration management tools are well suited to the task of network management, with Ansible in particular having a large selection of modules for configuration of network devices. Of course, the network doesn't end at the switch, and Ansible is equally well suited to driving network configuration on the attached hosts - from simple interface configuration to complex virtual networking topologies.

OpenStack Networks

OpenStack can be deployed in a number of configurations, with various levels of networking complexity. Some of the classes of networks that may be used in an OpenStack cluster include:

Power & out-of-band management network
Access to power management devices and out-of-band management systems (e.g. BMCs) of control and compute plane hosts.
Overcloud provisioning network
Used by the seed host to provision the control plane and virtualised compute plane hosts.
Workload inspection network
Used by the control plane hosts to inspect the hardware of the bare metal compute hosts.
Workload provisioning network
Used by the control plane hosts to provision the bare metal compute hosts.
Workload cleaning network
Used by the control plane hosts to clean the bare metal compute hosts after use.
Internal network
Used by the control plane for internal communication and access to the internal and admin OpenStack API endpoints.
External network
Hosts the public OpenStack API endpoints and provides external network access for the hosts in the system.
Tenant networks
Used by tenants for communication between compute instances. Multiple networks can provide isolation between tenants. These may be overlay networks such as GRE or VXLAN tunnels but are more commonly VLANs in bare metal compute environments.
Storage network
Used by control and compute plane hosts for access to storage systems.
Storage management network
Used by storage systems for internal communication.

Hey wait, where are you going? Don't worry, not all clusters require all of these classes of networks, and in general it's possible to map more than one of these to a single virtual or physical network.

Kayobe & Kolla-ansible

Kayobe heavily leverages the Kolla-ansible project to deploy a containerised OpenStack control plane. In general, Kolla-ansbile performs very little direct configuration of the hosts that it manages - most is limited to Docker containers and volumes. This leads to a very reliable and portable tool, but does leave wide open the question of how to configure the underlying hosts to the point where they can run Kolla-ansible's containers. This is where Kayobe comes in.

Host networking

Kolla-ansible takes as input the names of network interfaces that map to the various classes of network that it differentiates. The following variables should be set in globals.yml.

# Internal network
api_interface: br-eth1

# External network
kolla_external_vip_interface: br-eth1

# Storage network
storage_interface: eth2

# Storage management network
cluster_interface: eth2

In this example we have two physical interfaces, eth1 and eth2. A software bridge br-eth1 exists, into which eth1 is plugged. Kolla-ansible expects these interfaces to be up and configured with an IP address.

Rather than reinventing the wheel, Kayobe makes use of existing Ansible roles available on Ansible Galaxy. Galaxy can be a bit of a wild west, with many overlapping and unmaintained roles of dubious quality. That said, with a little perseverance it's possible to find good quality roles such as MichaelRigart.interfaces. Kayobe uses this role to configure network interfaces, bridges, and IP routes on the control plane hosts.

Here's an example of a simple Ansible playbook that uses the MichaelRigart.interfaces role to configure eth2 for DHCP, and a bridge breth1 with a static IP address, static IP route and a single port, eth1.

- name: Ensure network interfaces are configured
  hosts: localhost
  become: yes
  roles:
    - role: MichaelRigart.interfaces

      # List of Ethernet interfaces to configure.
      interfaces_ether_interfaces:
        - device: eth2
          bootproto: dhcp

      # List of bridge interfaces to configure.
      interfaces_bridge_interfaces:
        - device: breth1
          bootproto: static
          address: 192.168.1.150
          netmask: 255.255.255.0
          gateway: 192.168.1.1
          mtu: 1500
          route:
            - network: 192.168.200.0
              netmask: 255.255.255.0
              gateway: 192.168.1.1
          ports:
            - eth1

If you are following along at home, ensure that Ansible maintains a stable control connection to the hosts being configured, and that this connection is not reconfigured by the MichaelRigart.interfaces role.

Kayobe uses Open vSwitch as the Neutron ML2 mechanism for providing network services such as DHCP and routing on the control plane hosts. Kolla-ansible deploys a containerised Open vSwitch daemon, and creates OVS bridges for Neutron which are attached to existing network interfaces. Kayobe creates virtual Ethernet pairs using its veth role, and configures Kolla-ansible to connect the OVS bridges to these.

A quick namecheck of other Galaxy roles used by Kayobe for host network configuration: ahuffman.resolv configures the DNS resolver, and resmo.ntp configures the NTP daemon. Thanks go to the maintainers of these roles.

The bigger picture

The previous examples show how one might configure a set of network interfaces on a single host, but how can we extend that configuration to cover multiple hosts in a cluster in a declarative manner, without unnecessary repetition? Ansible's combination of YAML and Jinja2 templating turns out to be great at this.

A network in Kayobe is assigned a name, which is used as a prefix for all variables that describe the network's attributes. Here's the global configuration for a hypothetical example network that would typically be added to Kayobe's networks.yml configuration file.

# Definition of 'example' network.
example_cidr: 10.0.0.0/24
example_gateway: 10.0.0.1
example_allocation_pool_start: 10.0.0.3
example_allocation_pool_end: 10.0.0.127
example_vlan: 42
example_mtu: 1500
example_routes:
  - cidr: 10.1.0.0/24
    gateway: 10.0.0.2

Defining each option as a top level variable allows them to be overridden individually, if necessary.

We define the network's IP subnet, VLAN, IP routes, MTU, and a pool of IP addresses for Kayobe to assign to the control plane hosts. Static IP addresses are allocated automatically using Kayobe's ip-allocation role, but may be manually defined by pre-populating network-allocation.yml.

There are also some per-host configuration items that allow us to define how hosts attach to networks. These would typically be added to a group or host variable file.

# Definition of network interface for 'example' network.
example_interface: breth1
example_bridge_ports:
  - eth1

Kayobe defines classes of networks which can be mapped to the actual networks that have been configured. In our example, we may want to use the example network for both internal and external control plane communication. We would then typically define the following in networks.yml.

# Map internal network communication to 'example' network.
internal_net_name: example

# Map external network communication to 'example' network.
external_net_name: example

These network classes are used to determine to which networks each host should be attached, and how to configure Kolla-ansible. The default network list may be extended if necessary by setting controller_extra_network_interfaces in controllers.yml.

The final piece of this puzzle is a set of custom Jinja2 filters that allow us to query various attributes of these networks, using the name of the network.

# Get the MTU for the 'example' network.
example_mtu: "{{ 'example' | net_mtu }}"

We can also query attributes of other hosts.

# Get the network interface for the 'example' network on host 'controller1'.
example_interface: "{{ 'example' | net_interface('controller1') }}"

Finally, we can remove the explicit reference to our site-specific network name, example.

# Get the CIDR for the internal network.
internal_cidr: "{{ internal_net_name | net_interface }}"

The decoupling of network definitions from network classes enables Kayobe to be very flexible in how it configures a cluster. In our experience this is an area in which TripleO is a little rigid.

Further information on configuration of networks can be found in the Kayobe documentation.

The Kayobe configuration for the Square Kilometre Array (SKA) Performance Prototype Platform (P3) system provides a good example of how this works in a real system. We used the Skydive project to visualise the network topology within one of the OpenStack controllers in the P3 system. In the Pike release, Kolla-ansible adds support for deploying Skydive on the control plane and virtualised compute hosts. We had to make a small change to Skydive to fix discovery of the relationship between VLAN interfaces and their parent link, and we'll contribute this upstream. Click the image link to see it in its full glory.

Skydive on P3 controller

Physical networking

Hosts are of little use without properly configured network devices to connect them. Kayobe has the capability to manage the configuration of physical network switches using Ansible's network modules. Currently Dell OS6 and Dell OS9 switches are supported, while Juniper switches will be soon be added to the list.

Returning to our example of the SKA P3 system, we note that in HPC clusters it is common to need to manage multiple physical networks.

Physical networks in the P3 deployment

Each switch is configured as a host in the Ansible inventory, with host variables used to specify the switch's management IP address and admin user credentials.

ansible_host: 10.0.0.200
ansible_user: <admin username>
ansible_ssh_password: ******

Kayobe doesn't currently provide a lot of abstraction around switch configuration - it is specified using three per-host variables.

# Type of switch. One of 'dellos6', 'dellos9'.
switch_type: dellos6

# Global configuration. List of global configuration lines.
switch_config:
  - "ip ssh server"
  - "hostname \"{{ inventory_hostname }}\""

# Interface configuration. Dict mapping switch interface names to configuration
# dicts. Each dict contains a 'description' item and a 'config' item which should
# contain a list of per-interface configuration.
switch_interface_config:
  Gi1/0/1:
    description: controller1
    config:
      - "switchport mode access"
      - "switchport access vlan {{ internal_net_name | net_vlan }}"
      - "lldp transmit-mgmt"
  Gi1/0/2:
    description: compute1
    config:
      - "switchport mode access"
      - "switchport access vlan {{ internal_net_name | net_vlan }}"
      - "lldp transmit-mgmt"

In this example we define the type of the switch as dellos6 to instruct Kayobe to use the dellos6 Ansible modules. Note the use of the custom filters seen earlier to limit the proliferation of the internal VLAN ID throughout the configuration. The Kayobe dell-switch role applies the configuration to the switches when following command is run:

kayobe physical network configure --group <group name>

The group defines the set of switches to be configured.

Once more, the P3 system's Kayobe configuration provides some good examples. In particular, check out the configuration for one of the management switches, and the associated group variables.

Next Time

In the next article in this series we'll look at how StackHPC is working upstream on improvements to networking in the Ironic and Neutron Networking Generic Switch ML2 mechanism driver projects.

by Mark Goddard at July 14, 2017 07:00 PM

HPC Networking in OpenStack: Part 1

This post is the first in a series on HPC networking in OpenStack. In the series we'll discuss StackHPC's current and future work on integrating OpenStack with high performance network technologies. This post sets the scene and the varied networking capabilities of one of our recent OpenStack deployments, the Performance Prototype Platform (P3), built for the Square Kilometre Array (SKA) telescope's Science Data Processor (SDP).

(Not Too) Distant Cousins

There are many similarities between the cloud and HPC worlds, driving the adoption of OpenStack for scientific computing. Viewed from a networking perspective however, HPC clusters and modern cloud infrastructure can seem worlds apart.

OpenStack clouds tend to rely on overlay network technologies such as GRE and VXLAN tunnels to provide separation between tenants. These are often implemented in software, running atop a statically configured physical Ethernet fabric. Conversely, HPC clusters may feature a variety of physical networks, potentially including technologies such as Infiniband and Intel Omnipath Architecture. Low overhead access to these networks is crucial, with applications accessing the network directly in bare metal environments or via SR-IOV in when running in virtual machines. Performance may be further enhanced by using NICs with support for Remote Direct Memory Access (RDMA).

Background: the SKA and its SDP

The SKA is an awe-inspiring project, to which any short description of ours is unlikely to do justice. Here's what the SKA website has to say:

The Square Kilometre Array (SKA) project is an international effort to build the world’s largest radio telescope, with eventually over a square kilometre (one million square metres) of collecting area. The scale of the SKA represents a huge leap forward in both engineering and research & development towards building and delivering a unique instrument, with the detailed design and preparation now well under way. As one of the largest scientific endeavours in history, the SKA will bring together a wealth of the world’s finest scientists, engineers and policy makers to bring the project to fruition.

The SDP Consortium forms part of the SKA project, aiming to build a supercomputer-scale computing facility to process and store the data generated by the SKA telescope. The data ingested by the SDP is expected to exceed the global Internet traffic per day. Phew!

Artist's impression of SKA dishes in South Africa

The SKA will use around 3000 dishes, each 15 m in diameter. Credit: SKA Organisation

Performance Prototype Platform: a High Performance Melting Pot

The SDP architecture is still being developed, but is expected to incorporate the concept of a compute island, a scalable unit of compute resources and associated network connectivity. The SDP workloads will be partitioned and scheduled across these compute islands.

During its development, a complex project such as the SDP has many variables and unknowns. For the SDP this includes a variety of workloads and an assortment of new hardware and software technologies which are becoming available.

The Performance Prototype Platform (P3) aims to provide a platform that roughly models a single compute island, and allows SDP engineers to evaluate a number of different technologies against the anticipated workloads. P3 provides a variety of interesting compute, storage and network technologies including GPUs, NVMe memory, SSDs, high speed Ethernet and Infiniband.

OpenStack offers a compelling solution for managing the diverse infrastructure in the P3 system, and StackHPC is proud to have built an OpenStack management plane that allows the SDP team to get the most out of the system. The compute plane is managed as a bare metal compute resource using Ironic. The Magnum and Sahara services allow the SDP team to explore workloads based on container and data processing technologies, taking advantage of the native performance provided by bare metal compute.

How Many Networks?

The P3 system features multiple physical networks with different properties:

  • 1GbE out of band management network for BMC management
  • 10GbE control and provisioning network for bare metal provisioning, private workload communication and external network access
  • 25/100GbE Bulk Data Network (BDN)
  • 100Gbit/s EDR Infiniband Low Latency Network (LLN)
Physical networks in the deployment

On this physical topology we provision a set of static VLANs for the control plane and external network access, and dynamic VLANS for use by workloads. Neutron manages the control/provisioning network switches, but due to current limitations in ironic it cannot also manage the BDN or LLN, so these are provided as a shared resource.

The complexity of the networking in the P3 system means that automation is crucial to making the system managable. With the help of ansible's network modules, the Kayobe deployment tool is able to configure the physical and virtual networks of the switches and control plane hosts using a declarative YAML format.

Ironic's networking capabilities are improving rapidly, adding features such as multi-tenant network isolation and port groups but still have a way to go to reach parity with VMs. In a later post we'll discuss the work being done upstream in ironic by StackHPC to support multiple physical networks.

Next Time

In the next article in this series we'll discuss how the Kayobe project uses Ansible's network modules to define physical and virtual network infrastructure as code.

by Mark Goddard at July 14, 2017 05:00 PM

OpenStack Superuser

OpenStack deployment with Kolla Ansible made easy

When the 2016 OpenStack User Survey revealed that while OpenStack adoption was steadily increasing deployment was still a pain point, the call for further improvement was acknowledged.

Using OpenStack’s Kolla-Ansible we took on the challenge of establishing a method for deploying a fully mature cloud platform within an hour. Leveraging OpenStack Kolla’s production-ready Docker containers and support for complete customization of configurations, a deployment guidesuitable for novice users–was created.

Separate from the upstream Kolla-Ansible guide, this step-by-step document uses bash and Python scripts, as well as Ansible playbooks, to make the deployment of an OpenStack cloud fast and simple. The guide is broken down into two parts:

  1. Provisioning bare metal: This section describes how to provision your own bare metal server using the open source tool, Cobbler.
  2. Deploying OpenStack:
    1. Creating Docker Registry: The best way to create a docker registry on your deployment host.
    2. Configuring OpenStack services: How to prepare your deployment host for Kolla-Ansible, and deploying OpenStack with all core projects using Kolla-Ansible.
    3. Validating your deployment: Running a bash script using which you can test and validate your deployment.

To test our solution, the DevOps team followed the guide to provision 100 bare-metal servers, deploy OpenStack, then validate the deployment. The time taken for deployment by these experienced users was just under 30 minutes. However, the real test would be finding out how easy it would be for enthusiasts, with no prior deployment experience, to get up and running using the guide.

Developers who were novices to the task of deploying OpenStack were asked to use the guide and deploy a 22-node OpenStack cloud offering compute, networking and object storage services. Some 21 developers participated in deployments over the span of four weeks. Each participant was asked to record their time taken for each of the three phases of deployment and provide feedback related to the guides usability and accuracy.

The following visualizations show the average time it took developers to deploy a multi-node OpenStack cloud:

image00 

Using the feedback from each installation run, the guide was continuously updated and the time taken in provisioning, preparation, and deployment phases was consistently decreasing.  All 21 of the novice installs took less than 60 minutes, however, the final iteration averaged just 38 minutes.

The detailed ease-of-use guide can be obtained here.

This tutorial was created by Shashank Tavildar and Ianeta Hutchinson as part of their work on the team at the OpenStack Innovation Center.

The post OpenStack deployment with Kolla Ansible made easy appeared first on OpenStack Superuser.

by Shashank Tavildar and Ianeta Hutchinson at July 14, 2017 01:55 PM

Carlos Camacho

Create a TripleO snapshot before breaking it...

The idea of this post is to show how developers can save some time creating snapshots of their development environments for not deploying it each time it breaks.

So, don’t waste time re-deploying your environment when testing submissions.

I’ll show here how to be a little more agile when deploying your Undercloud/Overcloud for testing purposes.

Deploying a fully working development environment takes around 3 hours with human supervision… And breaking it just after deployed is not cool at all…

Step 1

Deploy your environment as usual.

Step 2

Create your Undercloud/Overcloud snapshots. Do this as the stack user, otherwise virsh won’t see the VMs

# The VMs deployed are:
vms=( "undercloud" "control_0" "compute_0" )

# List all VMs
virsh list --all

# List current snapshots
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

# Dump VMs XLM and check for qemu
for i in "${vms[@]}"; \
do \
virsh dumpxml "$i" | grep -i qemu; \
done

# Create an initial snapshot for each VM
for i in "${vms[@]}"; \
do \
echo "virsh snapshot-create-as --domain $i --name $i-fresh-install --description $i-fresh-install --atomic"; \
virsh snapshot-create-as --domain "$i" --name "$i"-fresh-install --description "$i"-fresh-install --atomic; \
done

# List current snapshots (After they should be already created)
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

#########################################################################################################
# Current libvirt version does not support live snapshots.
# error: Operation not supported: live disk snapshot not supported with this QEMU binary
# --disk-only and --live not yet available.

# Create the folder for the images
# cd
# mkdir ~/backup_images

# for i in "${vms[@]}"; \
# do \
# echo "<domainsnapshot>" > $i.xml; \
# echo "  <memory snapshot='external' file='/home/stack/backup_images/$i.mem.snap2'/>" >> $i.xml; \
# echo "  <disks>" >> $i.xml; \
# echo "    <disk name='vda'>" >> $i.xml; \
# echo "      <source file='/home/stack/backup_images/$i.disk.snap2'/>" >> $i.xml; \
# echo "    </disk>" >> $i.xml; \
# echo "  </disks>" >> $i.xml; \
# echo "</domainsnapshot>" >> $i.xml; \
# done

# for i in "${vms[@]}"; \
# do \
# echo "virsh snapshot-create $i --xmlfile ~/$i.xml --atomic"; \
# virsh snapshot-create $i --xmlfile ~/$i.xml --atomic; \
# done

Step 3

Break your environment xD

Step 4

Restore your snapshots

# Commented for safety reasons...
# i=compute_0
i=blehblehbleh
virsh list --all
virsh shutdown $i
sleep 120
virsh list --all
virsh snapshot-revert --domain $i --snapshotname $i-fresh-install --running
virsh list --all

by Carlos Camacho at July 14, 2017 12:00 AM

Create a TripleO snapshot before breaking it...

The idea of this post is to show how developers can save some time creating snapshots of their development environments for not deploying it each time it breaks.

So, don’t waste time re-deploying your environment when testing submissions.

I’ll show here how to be a little more agile when deploying your Undercloud/Overcloud for testing purposes.

Deploying a fully working development environment takes around 3 hours with human supervision… And breaking it just after deployed is not cool at all…

Step 1

Deploy your environment as usual.

Step 2

Create your Undercloud/Overcloud snapshots. Do this as the stack user, otherwise virsh won’t see the VMs

# The VMs deployed are:
# $vms will have something like ne next line...
# vms=( "undercloud" "control_0" "compute_0" )
vms=( $(virsh list --all | grep running | awk '{print $2}') )

# List all VMs
virsh list --all

# List current snapshots
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

# Dump VMs XLM and check for qemu
for i in "${vms[@]}"; \
do \
virsh dumpxml "$i" | grep -i qemu; \
done

# Create an initial snapshot for each VM
for i in "${vms[@]}"; \
do \
echo "virsh snapshot-create-as --domain $i --name $i-fresh-install --description $i-fresh-install --atomic"; \
virsh snapshot-create-as --domain "$i" --name "$i"-fresh-install --description "$i"-fresh-install --atomic; \
done

# List current snapshots (After they should be already created)
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

#########################################################################################################
# Current libvirt version does not support live snapshots.
# error: Operation not supported: live disk snapshot not supported with this QEMU binary
# --disk-only and --live not yet available.

# Create the folder for the images
# cd
# mkdir ~/backup_images

# for i in "${vms[@]}"; \
# do \
# echo "<domainsnapshot>" > $i.xml; \
# echo "  <memory snapshot='external' file='/home/stack/backup_images/$i.mem.snap2'/>" >> $i.xml; \
# echo "  <disks>" >> $i.xml; \
# echo "    <disk name='vda'>" >> $i.xml; \
# echo "      <source file='/home/stack/backup_images/$i.disk.snap2'/>" >> $i.xml; \
# echo "    </disk>" >> $i.xml; \
# echo "  </disks>" >> $i.xml; \
# echo "</domainsnapshot>" >> $i.xml; \
# done

# for i in "${vms[@]}"; \
# do \
# echo "virsh snapshot-create $i --xmlfile ~/$i.xml --atomic"; \
# virsh snapshot-create $i --xmlfile ~/$i.xml --atomic; \
# done

Step 3

Break your environment xD

Step 4

Restore your snapshots

# Commented for safety reasons...
# i=compute_0
i=blehblehbleh
virsh list --all
virsh shutdown $i
sleep 120
virsh list --all
virsh snapshot-revert --domain $i --snapshotname $i-fresh-install --running
virsh list --all

by Carlos Camacho at July 14, 2017 12:00 AM

July 13, 2017

Red Hat Stack

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 2

Ready for more Fast Packets?!

In Part 1 we reviewed the fundamentals of achieving zero packet loss, covering the concepts behind the process. In his next instalment Federico Iezzi, EMEA Cloud Architect with Red Hat continues his series diving deep into the details behind the tuning.

Buckle in and join the fast lane of packet processing!

opwithtoolsinside

Getting into the specifics

It’s important to understand the components we’ll be working with for the tuning. Achieving our goal of zero packet loss begins right at the core of Red Hat OpenStack Platform: Red Hat Enterprise Linux (RHEL).

The tight integration between these products is essential to our success here and really demonstrates how the solid RHEL foundation is an incredibly powerful aspect of Red Hat OpenStack Platform.

So, let’s do it …

SystemD CPUAffinity

The SystemD CPUAffinity setting allows you to indicate which CPU cores should be used when SystemD spawns new processes. Since it only works for SystemD managed services two things  should be noted. Firstly, the kernel thread has to be managed in a different way and secondly, all user executed process must be handled very carefully as they

16825122555_3fa958b022_m
Photo: CC0-licensed (Fritzchens Fritz)

might interrupt either the PMD threads or the VNFs. So, CPUAffinity is, in a way, a simplified replacement for the kernel boot parameter isolcpus. Of course, isolcpus does much more, such as disabling kernel and process thread balancing, but it can often be counter-productive unless you are doing real-time and shouldn’t be used.

So, what happened to isolcpus?
Isolcpus was the way, until a few years ago, to isolate both kernel and user process to specific CPU cores. To make it more real-time oriented, the load balancing between the isolated CPU cores was disabled. This means that once a thread (or a set of threads) is created on an isolated CPU core, even if it is at 100% usage, the Linux process scheduler (SCHED_OTHER) will never move any of those threads away. For more info check out this article on the Red Hat Customer Portal (registration required).

IRQBALANCE_BANNED_CPUS

The IRQBALANCE_BANNED_CPUS allows you to indicate which CPU cores should be skipped when rebalancing the irqs. CPU core numbers which have their corresponding bits set to one in this mask will not have any IRQ’s assigned to them on rebalance (it can be double checked at /proc/interrupts).

Tickless Kernel

Setting the kernel boot parameter nohz prevents frequent timer interrupts. In this case it is common practice to refer to a system as “tickless.” The tickless kernel feature enables “on-demand” timer interrupts: if there is no timer to be expired for, say, 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. The results will be fewer interrupts per second instead of scheduler interrupts occurring every 1ms.  

Adaptive-Ticks CPUs

Setting the kernel boot parameter nohz_full to the specific isolated CPU core value ensures the kernel doesn’t send scheduling-clock interrupts to CPUs in a single, runnable task. Such CPUs are said to be “adaptive-ticks CPUs.” This is important for applications with aggressive real-time response constraints because it allows them to improve their

85289822_6124c4865e_m
Photo: CC0-licensed. (Roland Tanglao)

worst-case response times by the maximum duration of a scheduling-clock interrupt. It is also important for computationally intensive short-iteration workloads: If any CPU is delayed during a given iteration, all the other CPUs will be forced to wait in idle while the delayed CPU finishes. Thus, the delay is multiplied by one less than the number of CPUs. In these situations, there is again strong motivation to avoid sending scheduling-clock interrupts. Finally, adaptive-ticks CPUs must have their RCU callbacks offloaded.

RCU Callbacks Offload

The kernel boot parameter rcu_nocbs, when set to the value of the isolated CPU cores, causes the offloaded CPUs to never queue RCU callbacks and therefore RCU never prevents offloaded CPUs from entering either dyntick-idle mode or adaptive-tick mode.

Fixed CPU frequency scaling

The kernel boot parameter, intel_pstate, when set to disable disables the CPU frequency scaling, setting the CPU frequency to the maximum allowed by the CPU. Having adaptive, and therefore varying, CPU frequency results in unstable performance.

nosoftlockup

16292083608_c114e2a3d2_q
Photo: CC0-licensed. (Alan Levine)

The kernel boot parameter nosoftlockup disables logging of backtraces when a process executes on a CPU for longer than the softlockup threshold (default is 120 seconds). Typical low-latency programming and tuning techniques might involve spinning on a core or modifying scheduler priorities/policies, which can lead to a task reaching this threshold. If a task has not relinquished the CPU for 120 seconds, the kernel will print a backtrace for diagnostic purposes.

Dirty pages affinity

Setting the /sys/bus/workqueue/devices/writeback/cpumask value to the specific cpu cores that are not isolated creates an affinity with the kernel thread which prefers to write dirty pages.

Execute workqueue requests

Setting the /sys/devices/virtual/workqueue/cpumask value to the cpu cores that are not isolated defines which kworker should receive which kernel task to do such things as interrupts, timers, I/O, etc.

Disable Machine Check Exception

Setting the /sys/devices/system/machinecheck/machinecheck*/ignore_ce value to 1, disables machine check exceptions. The MCE is a type of computer hardware error that occurs when a computer’s central processing unit detects a hardware problem.

KVM Low Latency

Both standard KVM modules and Intel KVM modules support a number of options to

33196069693_b987a2da1c_m
Photo: CC0-licensed. (Marsel Minga)

reduce latency by removing unwanted VM Exit and interrupts. 

Pause Loop Exiting

In the kvm module, the parameter kvmclock_periodic_sync is set to 0.
Full details can be found on page 37 of “Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3C: System Programming Guide, Part 3 (pdf)”

Periodic Kvmclock Sync

In the kvm_intel module, the parameter ple_gap is set to 0.
Full details are found in the upstream kernel commit.

SYSCTL Parameters
Some sysctl parameters are inherited as the network-latency and latency-performance tuned profiles are children of the cpu-partitioning profile. Below are the essential parameters for achieving zero packet loss:

Parameter Value Details
kernel.hung_task_timeout_secs 600 Increases the Huge task timeout; however, no error will be reported given the nosoftlockup in the kernel boot parameters. From cpu-partitioning profile.
kernel.nmi_watchdog 0 Disables the Non-Maskable Interrupt (type of irq which gets force executed). From cpu-partitioning profile.
vm.stat_interval 10 Sets the refresh rate value of the virtual memory statistics update. The default value is 1 second. From cpu-partitioning profile.
kernel.timer_migration 1 In an SMP system, tasks are scheduled on different CPUs by the scheduler, interrupts are balanced across all of the available CPU cores by the irqbalancer daemon, but timers are still stuck on the CPU core which has created them. Enabling the timer_migration option, latest Linux Kernels – https://bugzilla.redhat.com/show_bug.cgi?id=1408308  – will always try to migrate the times away from the nohz_full CPU cores. From cpu-partitioning profile.
kernel.numa_balancing 0 Disables the automatic NUMA process balancing across the NUMA nodes. From network-latency profile.

Disable Transparent Hugepages

Setting the option “transparent_hugepages” to “never,” disables transparent hugepages. This is a way to force smaller memory pages (4K) to be merged into bigger memory pages (usually 2M).

Tuned parameters

The following tuned parameters should be configured to provide low-latency and disable power saving mechanisms. Setting the CPU governor to “performance” runs the CPU at the maximum frequency.

  • force_latency = 1
  • governor = performance
  • energy_perf_bias = performance
  • min_perf_pct = 100

Speeding to a conclusion

As you can see, there is a lot of preparation and tuning that goes into achieving zero packet loss. This blogpost detailed many parameters that require attention and tuning to make this happen.

Next time the series finishes with an example of how this all comes together!

Part 3 up now, check it out.

Love all this deep tech? Want to ensure you keep your Red Hat OpenStack Platform deployment rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

by m4r1k at July 13, 2017 11:44 PM

James Page

Ubuntu OpenStack Dev Summary – 13th July 2017

Welcome to the fourth Ubuntu OpenStack development summary!

This summary is intended to be a regular communication of activities and plans happening in and around Ubuntu OpenStack, covering but not limited to the distribution and deployment of OpenStack on Ubuntu.

If there is something that you would like to see covered in future summaries, or you have general feedback on content please feel free to reach out to me (jamespage on Freenode IRC) or any of the OpenStack Engineering team at Canonical!

OpenStack Distribution

Stable Releases

We still have a few SRU’s in-flight from the June SRU cadence:

Swift: swift-storage processes die if rsyslog is restarted (Kilo, Mitaka)
https://bugs.launchpad.net/ubuntu/trusty/+source/swift/+bug/1683076

Ocata Stable Point Releases
https://bugs.launchpad.net/ubuntu/+bug/1696139

Hopefully those should flush through to updates in the next week; in the meantime we’re preparing to upload fixes for:

Keystone: keystone-manage mapping_engine federation rule testing
https://bugs.launchpad.net/ubuntu/+bug/1655182

Neutron: router host binding id not updated after failover
https://bugs.launchpad.net/ubuntu/+bug/1694337

Development Release

The first Ceph Luminous RC (12.1.0) has been uploaded to Artful and will be backported to the Ubuntu Cloud Archive for Pike soon.

OpenStack Pike b3 is due towards the end of July; we’ve done some minor dependency updates to support progression towards that goal. It’s also possible to consume packages built from the tip of the upstream git repository master branches using:

sudo add-apt-repository ppa:openstack-ubuntu-testing/pike

Packages are automatically built for Artful and Xenial.

OpenStack Snaps

Refactoring to support the switch back to strict mode snaps has been completed. Corey posted last week on ‘OpenStack in a Snap’ so we’ll not cover too much in this update; have a read to get the full low down.

Work continues on snapstack (the CI test tooling for OpenStack snap validation and testing), with changes landing this week to support Class-based setup/cleanup for the base cloud and a logical step/plan method for creating tests.

The move of snapstack to a Class-based setup/cleanup approach for the base cloud enables flexibility where the base cloud required to test a snap can easily be updated. By default this will provide a snap’s tests with a default OpenStack base cloud, however this can now easily be manipulated to add or remove services.

The snapstack code has also been updated to use a step/plan method for creating tests. These objects provide a simple and logical process for creating tests. The developer can now define the snap being tested, and it’s scripts/tests, in a step object. Each base snap and it’s scripts/tests are also define in individual step objects. All of these steps are then put together into a plan object, which is executed to kick off the deployment and tests.

For more details on snapstack you can check out the snapstack code here.

Nova LXD

The refactoring of the VIF plugging codebase to provide support for Linuxbridge and Open vSwitch + the native OVS firewall driver has been landed for Pike; this corrects a number of issues in the VIF plugging workflow between Neutron and Nova(-LXD) for these specific tenant networking configurations.

The nova-lxd subteam have also done some much needed catch-up on pull requests for pylxd (the underlying Python binding for LXD that nova-lxd uses); pylxd 2.2.4 is now up on pypi and includes fixes for improved forward compatibility with new LXD releases and support for passing network timeout configuration for API calls.

Work is ongoing to add support for LXD storage pools into pylxd.

OpenStack Charms

New Charms

Work has started on the new Gnocchi and GlusterFS charms; These should be up and consumable under the ‘openstack-charmers-next’ team on the charm store in the next week.

Gnocchi will support deployment with MySQL (for indexing), Ceph (for storage) and Memcached (for coordination between Gnocchi metricd workers). We’re taking the opportunity to review and refresh the telemetry support across all of the charms, ensuring that the charms are using up-to-date configuration options and are fully integrated for telemetry reporting via Ceilometer (with storage in Gnocchi). This includes adding support for the Keystone, Rados Gateway and Swift charms. We’ll also be looking at the Grafana Gnocchi integration and hopefully coming up with some re-usable sets of dashboards for OpenStack resource metric reporting.

Deployment Guide

Thanks to help from Graham Morrison in the Canonical docs team, we now have a first cut of the OpenStack Charms Deployment Guide – you can take a preview look in its temporary home until we complete the work to move it up under docs.openstack.org.

This is very much a v1, and the team intends to iterate on the documentation over time, adding coverage for things like high-availability and network space usage both in the charms and in the tools that the charms rely on (MAAS and Juju).

IRC (and meetings)

As always, you can participate in the OpenStack charm development and discussion by joining the #openstack-charms channel on Freenode IRC; we also have a weekly development meeting in #openstack-meeting-4 at either 1000 UTC (odd weeks) or 1700 UTC (even weeks) – see http://eavesdrop.openstack.org/#OpenStack_Charms for more details.

EOM


by JavaCruft at July 13, 2017 03:19 PM

OpenStack Superuser

OpenStack Neutron: an NFV primer

Rossella Sblendido, a software engineer at SUSE and core reviewer for Neutron, knows that people have strong opinions about OpenStack’s networking-as-a-service project.

In a recent talk at the OPNFV Summit, Sblendido kicked things off with three emojis:

While she wasn’t sure she could make people love Neutron, she aimed to clear up some of the confusion, especially for NFV folks. She outlined the history of the project, its architecture and plug-ins plus provides some insight into the compute mode and L2, L3, DHCP and Metadata agents, achieving traffic isolation and packet flow with OVS.


Ildikó Vancsa, ecosystem technical lead at the OpenStack Foundation, paired up with Sblendido for the second half of the talk, diving into future plans and specific functionalities for NFV. She talks about quality of service (QoS) issues including minimum bandwidth support, egress traffic, the need for scheduling improvements and upcoming initiatives to limit ingress bandwidth including OVS agent support and those around Ethernet networking standard QinQ.

You can catch the whole 31-minute talk below:

Cover Photo // CC BY NC

The post OpenStack Neutron: an NFV primer appeared first on OpenStack Superuser.

by Superuser at July 13, 2017 01:57 PM

NFVPE @ Red Hat

yakLab Part 1a: Building the virtual Cobbler deployment

In this scene I’ll discuss how I’ve built out a local Cobbler deployment into my virtual host in order to bootstrap the operating system onto my baremetal nodes via kickstart files and PXE booting.

by Leif Madsen at July 13, 2017 12:45 AM

Building the virtual Cobbler deployment

In this scene I’ll discuss how I’ve built out a local Cobbler deployment into my virtual host in order to bootstrap the operating system onto my baremetal nodes via kickstart files and PXE booting.

by Leif Madsen at July 13, 2017 12:45 AM

July 12, 2017

NFVPE @ Red Hat

yakLab build out

The yakLab is a place where yaks are electronically instantiated for the purpose of learning and documenting. The lab consists of a virtualization host (virthost) which has 64GB of memory and hosts all the virtual machines, primarily for infrastructure.

by Leif Madsen at July 12, 2017 07:00 PM

OpenStack Superuser

Superuser Awards nominations open for OpenStack Summit

Nominations for the OpenStack Summit Sydney Superuser Awards are open and will be accepted through midnight Pacific Time September 8.

All nominees will be reviewed by the community and the Superuser editorial advisors will determine the winner that will be announced onstage at the Summit in November.

The Superuser Awards recognize teams using OpenStack to meaningfully improve business and differentiate in a competitive industry, while also contributing back to the community.

Teams of all sizes are encouraged to apply. If you fit the bill, or know a team that does, we encourage you to submit a nomination here.

Each team should submit the application in the appropriate category. After the community has a chance to review all nominees, the Superuser editorial advisors will narrow it down to four finalists and select the winner.

Launched at the Paris Summit in 2014, the community has continued to award winners at every Summit to users who show how OpenStack is making a difference and provide strategic value in their organization. Past winners include CERN, Comcast, NTT GroupAT&T and China Mobile.

The OpenStack community will have the chance to review the list of nominees, how they are running OpenStack, what open source technologies they are using and the ways they are contributing back to the OpenStack community.

Then, the Superuser editorial advisors will review the submissions, narrow the nominees down to four finalists and review the finalists to determine the winner based on the submissions.

When evaluating winners for the Superuser Award, judges take into account the unique nature of use case(s), as well as integrations and applications of OpenStack performed by a particular team.

Additional selection criteria includes how the workload has transformed the company’s business, including quantitative and qualitative results of performance as well as community impact in terms of code contributions, feedback, knowledge sharing and the number of Certified OpenStack Administrators (COAs) on staff.

Winners will take the stage at the OpenStack Summit in Sydney. Submissions are open now until September 8, 2017. You’re invited to nominate your team or nominate a Superuser here.

For more information about the Superuser Awards, please visit http://superuser.openstack.org/awards.

The post Superuser Awards nominations open for OpenStack Summit appeared first on OpenStack Superuser.

by Allison Price at July 12, 2017 12:30 PM

July 11, 2017

Red Hat Stack

Tuning for Zero Packet Loss in Red Hat OpenStack Platform – Part 1

For Telcos considering OpenStack, one of the major areas of focus can be around network performance. While the performance discussion may often begin with talk of throughput numbers expressed in Million-packets-per-second (Mpps) values across Gigabit-per-second (Gbps) hardware, it really is only the tip of the performance iceberg. The most common requirement is to have absolutely stable and deterministic network performance (Mpps and latency) over the absolutely fastest possible throughput. With that in mind, many applications in the Telco space require low latency that can only tolerate zero packet loss.

In this “Operationalizing OpenStack” blogpost Federico Iezzi, EMEA Cloud Architect with Red Hat, discusses some of the real-world deep tuning and process required to make zero packet loss a reality!

opwithtoolsinside

Packet loss is bad for business …

Packet loss can be defined as occurring “when one or more packets of data travelling across a computer network fail to reach their destination [1].” Packet loss results in protocol latency as losing a TCP packet requires retransmission, which takes time. What’s worse, protocol latency manifests itself externally as “application delay.” And, of course, “application delay” is nothing more than a fancy term for something that all Telco’s want to avoid: a fault. So, as network performance degrades, and packets drop, retransmission occurs at higher and higher rates.  The more retransmission the more latency experienced and the slower the system gets. With increased packets due to this retransmission we also see increased congestion slowing the system even further.

Screen Shot 2017-06-26 at 9.17.04 AM

Tune in now for better performance …

So how do we prepare OpenStack for Telco? 

30019218201_a1d7f1963c_m1.jpg
Photo CC0-licensed (Alan Levine)

It’s easy! Tuning!

Red Hat OpenStack Platform is supported by a detailed Network Functions Virtualization (NFV) Reference Architecture which offers a lot of deep tuning across multiple technologies ranging from Red Hat Enterprise Linux to the Data Plane Development Kit (DPDK) from Intel.  A great place to start is with the Red Hat Network Functions Virtualization (NFV) Product Guide. It covers tuning for the following components:

  • Red Hat Enterprise Linux version 7.3
  • Red Hat OpenStack Platform version 10 or greater
  • Data plane tuning
    • Open vSwitch with DPDK at least version 2.6
    • SR-IOV VF or PF
  • System Partitioning through Tuned using profile cpu-partitioning at least version 2.8
  • Non-uniform memory access (NUMA) and virtual non-uniform memory access (vNUMA)
  • General OpenStack configuration

Hardware notes and prep …

It’s worth mentioning that the hardware to be used in achieving zero packet loss often

28584287540_a29eb7a12b_m
Photo CC0-licensed (PC Gehäuse)

needs to be latest generation. Hardware decisions around network interface cards and vendors can often affect packet loss and tuning success. For hardware, be sure to consult your vendor’s documentation prior to purchase to ensure the best possible outcomes. Ultimately, regardless of hardware, some setup should be done in the hardware BIOS/UEFI for stable CPU frequency while removing power saving features.

Setting Value
MLC Streamer Enabled
MLC Spatial Prefetcher Enabled
Memory RAS and Performance Config Maximum Performance
NUMA optimized Enabled
DCU Data Prefetcher Enabled
DCA Enabled
CPU Power and Performance Performance
C6 Power State Disabled
C3 Power State Disabled
CPU C-State Disabled
C1E Autopromote Disabled
Cluster-on-Die Disabled
Patrol Scrub Disabled
Demand Scrub Disabled
Correctable Error 10
Intel(R) Hyper-Threading Disabled or Enabled
Active Processor Cores All
Execute Disable Bit Enabled
Intel(R) Virtualization Technology Enabled
Intel(R) TXT Disabled
Enhanced Error Containment Mode Disabled
USB Controller Enabled
USB 3.0 Controller Auto
Legacy USB Support Disabled
Port 60/64 Emulation Disabled


BIOS Settings from:

Divide and Conquer …

Properly enforcing resource partitioning is essential in achieving zero packet loss performance and to do this you need to partition the resources between the host and the guest correctly. System partitioning ensures that software resources running on the host are always given access to dedicated hardware. However, partitioning goes further than just access to hardware as it can be used to ensure that resources utilize the closest possible memory addresses across all the processors. When a CPU retrieves data from a memory address it first looks at the local cache on the local processor core itself. Proper partitioning, via tuning, ensures that requests are answered from the closest cache (L1, L2 or L3 cache) as well as from the local memory, minimizing transaction times and the usage of a point-to-point processor interconnection bus such as the QPI (Intel QuickPath Interconnect). This way of accessing and dividing the memory is defined as NUMA (non-uniform memory access) design.

Tuned in …

System partitioning involves a lot of complex, low-level tuning. So how does one do this easily?

You’ll need to use the tuned daemon along with the the accompanying cpu partitioning profile. Tuned is a daemon that monitors the use of system components and dynamically tunes system settings based on that monitoring information. Tuned is distributed with a number of predefined profiles for common use cases. For all this to work, you’ll need the newest tuned features. This requires the latest version of tuned (i.e. 2.8 or later) as well as the latest tuned cpu-partitioning profile (i.e. 2.8 or later). Both have are available publicly via the Red Hat Enterprise Linux 7.4 beta release or you can grab the daemon and profiles directly from their upstream projects. 

Interested in the latest generation of Red Hat Enterprise Linux? Be the first to know when it is released by following the official Red Hat Enterprise Linux Blog!


However, before any tuning can begin, you must first decide how the system should be partitioned.Screen Shot 2017-06-26 at 9.19.36 AM

Based on Red Hat experience with customer deployments, we usually find it necessary to define how the system should be partitioned for every specific compute model. In the example pictured above, the total number of PMD cores – one CPU core is two CPU threads – had to be carefully calculated by knowing the overall required Mpps as well as the total number of DPDK interfaces, both physical and vPort. An unbalanced PMD number versus DPDK ports will result in lower performance and interrupts which will generate packet-loss. The rest of the tuning was for the VNF threads, excluding at least one core per NUMA node for the operating system.


Looking for more great ways to ensure your Red Hat OpenStack Platform deployment is rock solid? Check out the Red Hat Services Webinar Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud today! 



Looking at the
upstream templates as well as in the tuned cpu-partitioning profile, there is a lot to understand about the specific settings that are executed on each core per NUMA node.

So, just what needs to be tuned? Find out more in Part 2 where you’ll get a thorough and detailed breakdown of many specific tuning parameters to help achieve zero packet loss!


The “Operationalizing OpenStack” series features real-world tips, advice and experiences from experts running and deploying OpenStack.

by m4r1k at July 11, 2017 11:52 PM

OpenStack Blog

OpenStack Developer Mailing List Digest July 1-8

Important Dates

  • July 14, 2017 23:59 OpenStack Summit Sydney Call for Presentations closes 1.
  • Around R-3 and R-4 (July 31 – August 11, 2017) PTL elections 2
  • All 3

Summaries

  • TC status update by Thierry 4
  • API Working Group new 5
  • Nova placement/resource providers update 6

SuccessBot Says

  • pabelanger on openstack-infra 7: opensuse-422-infracloud-chocolate-8977043 launched by nodepool
  • clark on openstack-infra 8: infra added citycloud to the pool of test nodes.
  • fungi on openstack-infra 9: OpenStack general mailing list archives from Launchpad (July 2010 to July 2013) have been imported into the current general archive on lists.openstack.org.
  • adreaf on openstack-qa: 10 Tempest ssh validation running by default in the gate on master.
  • All 11

Most Supported Goals And Improving Goal Completion

  • Community wide goals discussions started at the OpenStack Forum, then the mailing list and IRC for those that couldn’t be at the Forum.
    • These discussions help the TC make decisions on which goals will be to a release.
  • Potential goals:
    • Split Tempest plugins into separate repos/projects 12
    • Move policy and docs into code 13
  • Goals in Pike haven’t been really reached.
  • An idea from the meeting to address this is creating a role called “Champions” who are drum beaters to get a goal done, by helping projects with tracking status, and sometime doing code patches.
  • Interested volunteers who have a good understanding of their selected goal and its implementation to be a trusted person.
  • From the the discussion in thread, it seems we’re mostly in agreement with the Champion idea.
    • We have a volunteer for splitting out tempest plugins into repos/projects.
  • Full thread 14

 

  1. https://www.openstack.org/summit/sydney-2017/call-for-presentations/
  2. http://lists.openstack.org/pipermail/openstack-dev/2017-July/119359.html
  3. https://www.openstack.org/community/events/
  4. http://lists.openstack.org/pipermail/openstack-dev/2017-July/thread.html#119378
  5. http://lists.openstack.org/pipermail/openstack-dev/2017-July/119350.html
  6. http://lists.openstack.org/pipermail/openstack-dev/2017-July/thread.html#119388
  7. http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-05-24.log.html
  8. http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-05-24.log.html
  9. http://eavesdrop.openstack.org/irclogs/%23openstack-qa/%23openstack-qa.2017-05-28.log.html
  10. http://eavesdrop.openstack.org/irclogs/%23openstack-qa/%23openstack-qa.2017-05-28.log.html
  11. https://wiki.openstack.org/wiki/Successes
  12. http://lists.openstack.org/pipermail/openstack-dev/2017-July/thread.html#119378
  13. https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg106392.html
  14. http://lists.openstack.org/pipermail/openstack-dev/2017-June/thread.html#118808

#openstack #openstack-dev-digest

by Mike Perez at July 11, 2017 11:38 PM

Chris Dent

TC Report 28

It's been a while since I've done one of these. I was between employers and taking a break but am now back in the groove. There's been some activity in the interim, which I'll try to summarize below, but first notes from this evening's meeting.

I'm not fully up to speed yet, so this may be a bit sparse. It will pick up.

Meeting

A meeting was declared to "discuss the next steps in establishing the vision" and "wrap up the goals". Meeting notes and log.

Goals

The first topic was moving forward on the two community goals for the Queens cycle. There was no disagreement, especially as "champions" have stepped forward to shepherd all of the projects on the goals. The chosen goals are:

Vision Next Step

Again, mostly violent agreement on what to do: Accept the revised vision and see how it goes. To ensure that the document is effectively responsive to any necessary adjustments over time, it is being moved from reference to resolution. There's a stack of four changes starting with the original draft.

The vision itself has been slightly adjusted to be a bit more amenable to skimming and make the overarching goals a bit more obvious.

Other Meeting Stuff

How's Office Hours Going?

Mixed. But worth continuing the experiment. The hope is that office hours provide a reliable but casual way to interact with members of the TC. Thus far they have mostly been the TC talking amongst themselves, but several attendees at tonight's meeting reported that though they don't speak much in office hours, they do read. I'd personally like to see a lot more participation from anyone and everyone.

Some people would like to change the schedule a bit, as one of the three slots is a lot more popular than others and the least popular is very unpopular. The reaction? "patches accepted".

That 01:00 UTC Wednesday slot is designed to allow some interaction with people in the APAC region, with a long term goal of establishing future leaders in OpenStack from that region.

What's clear is that when there are some people there, a conversation happens containing relevant discussion. There's always something to talk about. For example though there were two agenda items in this meeting, new topics kept coming up.

The Diversity Report

Bitergia produced a report on gender diversity in OpenStack, though their data has some issues the general conclusion (we could do a lot better) stands.

Glare and Glance Compatibility

In office hours earlier today there was some discussion about Glare's application to be an official project. This came up again in tonight's meeting and there is also a long thread on openstack-dev. There are concerns about overlap with Glance. If the overlap is such that an exactly concurrent API could be provided, this is potentially a very good thing. However, if the overlap is almost-but-not-quite then that could present problems. The mailing list thread has more information.

Pending Stuff

"big tent" and "hosted projects"

Two long email threads

covered a lot of ground trying to work on the topic "better communicating what is OpenStack". The "big tent" term is misunderstood and misused and the difference between an "official" (subject to TC governance) project and one that just happens to use OpenStack infra is also misunderstood, but sometimes manipulated for gain.

While it was decided to straighforwardly purge "big tent" in the governance repository the discussion about hosted projects went very broad (the OpenStack adaptation of Godwin's law is that any discussion will eventually generalize to "What is OpenStack?") with some consideration of no longer allowing use of some combination of

  • the openstack prefix in git repositories
  • openstack infrastructure in general

to just anyone who comes along. It's not clear how this was resolved, if at all. There was an impassioned plea to fix the real problem(s) instead of limiting open access for people who want to create more openness.

Does anyone recall where this topic landed, or if it hasn't yet landed, does anyone have good ideas on how to get it to land?

by Chris Dent at July 11, 2017 10:20 PM

SUSE Conversations

Is a Cloud-First Strategy Right for Your Business?

Gartner predicts that more than half of global enterprises that already use cloud today will adopt an all-in, or cloud-first, strategy by 2021. Should you do the same, and if so, where should you begin? First, it’s important to understand that a cloud-first strategy is a business strategy that aims to reduce IT costs by transitioning …

+read more

The post Is a Cloud-First Strategy Right for Your Business? appeared first on SUSE Blog. Mark_Smith

by Mark_Smith at July 11, 2017 03:36 PM

OpenStack Superuser

Gender diversity in the OpenStack community: A new report

A new report on gender diversity in OpenStack adds new numbers to the valuable debate about inclusion in open source projects. OpenStack pioneered research in this field since 2014 when it started tracking gender for all its contributors and Foundation members.

The new Intel-sponsored report from Bitergia takes this research even further, specifically examining gender diversity and retention within the OpenStack community.

Companies have placed a steady importance on diversity and inclusion to further innovation — state the report’s authors Daniel Izquierdo of Bitergia, Nicole Huesman from Intel and Allison Price of the OpenStack Foundation — beginning with visible measurement and reporting of data in 2012-2013, which has spurred a high degree of focus, accountability and discussion on increasing the numbers of women and underrepresented minorities within the technology industry.

“Since the technology industry started measuring and publishing numbers on diversity, the dialogue and actions have increased,” says Nithya Ruff, Women of OpenStack Member and senior director at Comcast, in a foreword to the report.

The report highlights that all contributions — both technical and non-technical — must be recognized, and women often contribute more heavily in non-technical areas.

The authors of the analysis, led by open source analysis experts Bitergia, highlight in the recommendations that, based on evidence, inclusive communities have good documentation, on-boarding processes and mentors. It recommends tracking both the tenure and attrition of women in the OpenStack community, and studying the impact of specific policies and initiatives undertaken by the OpenStack Foundation.

You can download the 33-page report here. Have ideas for the OpenStack Foundation or other open source communities? Let us know on Twitter!

 

 

The post Gender diversity in the OpenStack community: A new report appeared first on OpenStack Superuser.

by Superuser at July 11, 2017 11:13 AM

SUSE Conversations

Zapraszamy na Letnią Akademię SUSE – Od zera do bohatera

Zwykle w czasie wakacji w dziale IT jest nieco mniej pracy. Warto więc to wykorzystać i poszerzyć swoją wiedzę o najważniejszych obecnie rozwiązaniach open source podczas drugiej edycji Letniej Akademii SUSE.  W tym roku będziemy na niej mówić o konteneryzacji (Kubernetes/MicroOS/Salt), wdrażaniu chmur (OpenStack) i pamięci masowej zdefiniowanej programowo (Ceph), zapewnieniu ciągłości działania (Live Patching) …

+read more

The post Zapraszamy na Letnią Akademię SUSE – Od zera do bohatera appeared first on SUSE Blog. Rafal Kruschewski

by Rafal Kruschewski at July 11, 2017 10:46 AM

July 10, 2017

OpenStack Superuser

What you need to know about the upcoming OpenStack User Committee Elections

The User Committee Elections are around the corner. OpenStack has been a great success and is continuing to cultivate. Additional ecosystem partners are enhancing support for OpenStack and it has become more and more important that the communities developing services around OpenStack lead and influence the software’s movement.

The OpenStack User Committee helps increase operator involvement, collects feedback from the community, works with user groups around the world, and parses through user survey data, to name a few. Users are critical, and the User Committee aims to represent the users.

They are looking to elect three (3) User Committee members for this election. These User Committee seats will be valid for a one-year term. For this election, the Active User Contributors (AUC) community will review the candidates and vote.

What makes a remarkable candidate for the User Committee?

Well, to start, the nominee has to be an individual member of the foundation who is an Active User Contributor (AUC). Additionally, below are a few things that will make you stand out:

  •      An OpenStack end-user and/or operator
  •      Organizer of an OpenStack local user meetup group
  •      An OpenStack contributor among the UC working groups
  •      Actively engaged in the OpenStack community
  •      Able to attend or be remotely engaged in the OpenStack Summits

Beyond the kinds of community activities you are already engaged in, the User Committee role adds some additional work. The User Committee usually interacts on e-mail to discuss any pending topics, and holds IRC meetings regularly. Prior to each summit we spend a few hours to go through the user survey results and analyze the data.\

You can nominate yourself or someone else by sending an email to the user-committee@lists.openstack.org mailing-list, with the subject: “UC candidacy” from July 31 – August 11, 05:59 UTC. Voting for the User Committee (UC) members will be open on August 14 and will remain open until August 18, 11:59 UTC. Additional details and the process for nomination can be found here.

We look forward to receiving your submissions!

Cover Photo // CC BY NC

The post What you need to know about the upcoming OpenStack User Committee Elections appeared first on OpenStack Superuser.

by Superuser at July 10, 2017 11:35 AM

Mirantis

Introducing Virtlet: VMs and Containers on one OpenContrail network in Kubernetes — a new direction for NFV?

Virtlet is a Kubernetes CRI (Container Runtime Interface) implementation for running VM-based pods on Kubernetes clusters.

by Jakub Pavlik at July 10, 2017 05:30 AM

July 07, 2017

StackHPC Team Blog

Ethernet's future is approaching - fast

Mellanox certainly know how to throw a party.

We joined them at the Sky Garden, a private venue atop one of London's most iconic skyscrapers. With a hint of "Bond villain's lair" about it, this luxurious location made a perfect backdrop to announce some special networking products.

The Sky Garden

During the launch, StackHPC's CEO John Taylor discussed the scale of the data challenges of the Square Kilometre Array with Mellanox CEO Eyal Waldman.

John Taylor and Eyal Waldman

Spectrum-2 supports ground-breaking Ethernet link speeds: 200Gbits/s and 400Gbits/s were announed. 400Gbits/s is pushing the envelope to the point where it doesn't even have a physical cabling standard ratified yet.

Customers with the most demanding network-intensive workloads may hope to soon have access to the Mellanx ConnectX-6 200Gbits/s NIC. The next generation NIC also promises enhanced support for Open vSwitch offloads. There isn't a NIC announced for 400Gbits/s yet...

The Future of SDN

Aside from raw speed, there were also some really interesting features for SDN. The Spectrum-2 ASIC includes support for the emerging P4 language, embodying the next generation of SDN, and we will be watching for details of that as they become public.

Open Ethernet supported

Open Ethernet

Mellanox CEO Eyal Waldman also affirmed Spectrum-2 support for Open Ethernet, enabling customers to choose an alternative network OS (Mellanox OS and Cumulus Linux) - although neither choice would be considered open source.

by Stig Telfer at July 07, 2017 04:40 PM

OpenStack Superuser

A simple way to create OpenStack Swift reports

openstack_project_swift_verticalIt’s no secret that I love OpenStack Swift. While it is not always a two-way relationship, I use Swift as much as I can: mostly for long-term backups, to serve static websites and even streaming.

While the functionalities are awesome, it’s also important to get the accounting/usage information of the platform. Out of the box, Swift does not allow even an administrator to access accounting information from a given account. The “standard” approach is to use the Telemetry feature of OpenStack (aka Ceilometer), but I’m not a fan of that project either. In my opinion, telemetry is  pumping so much data that in most cases is overkill; I prefer a simpler approach.

To create a report of Swift usage, we need to use the Reseller Admin concept in Swift to query account statistics from a single admin-level user.  The reseller role (named “ResellerAdmin” by default) can operate on any Swift account.

While getting the concept is a bit tricky (and undocumented as well), the truth is that is quite straightforward to enable it. Create a “ResellerAdmin” role on OpenStack with the command openstack role create ResellerAdmin and grant the role to the user that need to access the containers, ex: the user admin.

Edit the  Swift proxy-server.conf (filter:keystone section) and add the lines highlighted in bold.

[filter:keystone]
use = egg:swift#keystoneauth
operator_roles = admin, SwiftOperator
reseller_admin_role = ResellerAdmin
reseller_prefix = AUTH_
is_admin = true
cache = swift.cache

Now the admin user can enumerate the projects and get statistics of all the projects and containers. It’s now easy enough to cycle through all the projects and get the used bytes, as shown below:

$ swift stat --os-project-name myproject
      Account: AUTH_c9f567ce0c7f484e918ac8fc798f988f
      Containers: 4
      Objects: 325   
      Bytes: 101947377850 
      Containers in policy "policy-0": 4
      Objects in policy "policy-0": 325
      Bytes in policy "policy-0": 101947377850
      X-Account-Project-Domain-Id: default
      X-Timestamp: 1487950953.36228
      X-Trans-Id: tx49e7b3d4e1a24f529fbc6-00594fb813
      Content-Type: text/plain; charset=utf-8
      Accept-Ranges: bytes

Giuseppe Paterno’ is a an IT architect and security expert. This post first appeared on Paterno’s blog.

Superuser is always interested in community content, get in touch: editorATsuperuser.org

Cover Photo // CC BY NC

The post A simple way to create OpenStack Swift reports appeared first on OpenStack Superuser.

by Giuseppe Paterno' at July 07, 2017 01:12 PM

Julie Pichon

TripleO Deep Dive: Internationalisation in the UI

Yesterday, as part of the TripleO Deep Dives series I gave a short introduction to internationalisation in TripleO UI: the technical aspects of it, as well as a quick overview of how we work with the I18n team.

You can catch the recording on BlueJeans or YouTube, and below's a transcript.

~

Life and Journey of a String

Internationalisation was added to the UI during Ocata - just a release ago. Florian implemented most of it and did the lion's share of the work, as can be seen on the blueprint if you're curious about the nitty-gritty details.

Addition to the codebase

Here's an example patch from during the transition. On the left you can see how things were hard-coded, and on the right you can see the new defineMessages() interface we now use. Obviously new patches should directly look like on the right hand-side nowadays.

The defineMessages() dictionary requires a unique id and default English string for every message. Optionally, you can also provide a description if you think there could be confusion or to clarify the meaning. The description will be shown in Zanata to the translators - remember they see no other context, only the string itself.

For example, a string might sound active like if it were related to an action/button but actually be a descriptive help string. Or some expressions are known to be confusing in English - "provide a node" has been the source of multiple discussions on list and live so might as well pre-empt questions and offer additional context to help the translators decide on an appropriate translation.

Extraction & conversion

Now we know how to add an internationalised string to the codebase - how do these get extracted into a file that will be uploaded to Zanata?

All of the following steps are described in the translation documentation in the tripleo-ui repository. Assuming you've already run the installation steps (basically, npm install):

$ npm run build

This does a lot more than just extracting strings - it prepares the code for being deployed in production. Once this ends you'll be able to find your newly extracted messages under the i18n directory:

$ ls i18n/extracted-messages/src/js/components

You can see the directory structure is kept the same as the source code. And if you peek into one of the files, you'll note the content is basically the same as what we had in our defineMessages() dictionary:

$ cat i18n/extracted-messages/src/js/components/Login.json 
[
  {
    "id": "UserAuthenticator.authenticating",
    "defaultMessage": "Authenticating..."
  },
  {
    "id": "Login.username",
    "defaultMessage": "Username"
  },
  {
    "id": "Login.usernameRequired",
    "defaultMessage": "Username is required."
  },
[...]

However, JSON is not a format that Zanata understands by default. I think the latest version we upgraded to, or the next one might have some support for it, but since there's no i18n JSON standard it's somewhat limited. In open-source software projects, po/pot files are generally the standard to go with.

$ npm run json2pot

> tripleo-ui@7.1.0 json2pot /home/jpichon/devel/tripleo-ui
> rip json2pot ./i18n/extracted-messages/**/*.json -o ./i18n/messages.pot

> [react-intl-po] write file -> ./i18n/messages.pot ✔️

$ cat i18n/messages.pot 
msgid ""
msgstr ""
"POT-Creation-Date: 2017-07-07T09:14:10.098Z\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"X-Generator: react-intl-po\n"


#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.noNodesToRegister] - undefined
msgid ""No Nodes To Register""
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/NodesToolbar/NodesToolbar.json
#. [Toolbar.activeFilters] - undefined
#: ./i18n/extracted-messages/src/js/components/validations/ValidationsToolbar.json
#. [Toolbar.activeFilters] - undefined
msgid "Active Filters:"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.addNew] - Small button, to add a new Node
msgid "Add New"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/plan/PlanFormTabs.json
#. [PlanFormTabs.addPlanName] - Tooltip for "Plan Name" form field
msgid "Add a Plan Name"
msgstr ""
[...]

This messages.pot file is what will be automatically uploaded to Zanata.

Infra: from the git repo, to Zanata

The following steps are done by the infrastructure scripts. There's infra documentation on how to enable translations for your project, in our case as the first internationalised JavaScript project we had to update the scripts a little as well. This is useful to know if an issue happens with the infra jobs; debugging will probably bring you here.

The scripts live in the project-config infra repo and there are three files of interest for us:

In this case, upstream_translation_update.sh is the file of interest to us: it simply sets up the project on line 76, then sends the pot file up to Zanata on line 115.

What does "setting up the project" entails? It's a function in common_translations_update.sh, that pretty much runs the steps we talked about in the previous section, and also creates a config file to talk to Zanata.

Monitoring the post jobs

Post jobs run after a patch has already merged - usually to upload tarballs where they should be, update the documentation pages, etc, and also upload messages catalogues onto Zanata. Being a 'post' job however means that if something goes wrong, there is no notification on the original review so it's easy to miss.

Here's the OpenStack Health page to monitor 'post' jobs related to tripleo-ui. Scroll to the bottom - hopefully tripleo-ui-upstream-translation-update is still green! It's good to keep an eye on it although it's easy to forget. Thankfully, AJaeger from #openstack-infra has been great at filing bugs and letting us know when something does go wrong.

Debugging when things go wrong: an example

We had a couple of issues whereby a linebreak gets introduced into one of the strings, which works fine in JSON but breaks our pot file. If you look at the content from the bug (the full logs are no longer accessible):

2017-03-16 12:55:13.468428 | + zanata-cli -B -e push --copy-trans False
[...]
2017-03-16 12:55:15.391220 | [INFO] Found source documents:
2017-03-16 12:55:15.391405 | [INFO]            i18n/messages
2017-03-16 12:55:15.531164 | [ERROR] Operation failed: missing end-quote

You'll notice the first line is the last function we call in the upstream_translation_update.sh script; for debugging that gives you an idea of the steps to follow to reproduce. The upstream Zanata instance also lets you create toy projects, if you want to test uploads yourself (this can't be done directly on the OpenStack Zanata instance.)

This particular newline issue has popped up a couple of times already. We're treating it with band-aids at the moment, ideally we'd get a proper test on the gate to prevent it from happening again: this is why this bug is still open. I'm not very familiar with JavaScript testing and haven't had a chance to look into it yet; if you'd like to give it a shot that'd be a useful contribution :)

Zanata, and contributing translations

The OpenStack Zanata instance lives at https://translate.openstack.org. This is where the translators do their work. Here's the page for tripleo-ui, you can see there is one project per branch (stable/ocata and master, for now). Sort by "Percent Translated" to see the languages currently translated. Here's an example of the translator's view, for Spanish: you can see the English string on the left, and the translator fills in the right side. No context! Just strings.

At this stage of the release cycle, the focus would be on 'master,' although it is still early to do translations; there is a lot of churn still.

If you'd like to contribute translations, the I18n team has good documentation about how to go about how to do it. The short version: sign up on Zanata, request to join your language team, once you're approved - you're good to go!

Return of the string

Now that we have our strings available in multiple languages, it's time for another infra job to kick in and bring them into our repository. This is where propose_translation_update.sh comes in. We pull the po files from Zanata, convert them to JSON, then do a git commit that will be proposed to Gerrit.

The cleanup step does more than it might seem. It checks if files are translated over a certain ratio (~75% for code), which avoids adding new languages when there might only be one or two words translated (e.g. someone just testing Zanata to see how it works). Switching to your language and yet having the vast majority of the UI still appear in English is not a great user experience.

In theory, files that were added but are now below 40% should get automatically removed, however this doesn't quite work for JavaScript at the moment - another opportunity to help! Manual cleanups can be done in the meantime, but it's a rare event so not a major issue.

Monitoring the periodic jobs

Zanata is checked once a day every morning, there is an OpenStack Health page for this as well. You can see there are two jobs at the moment (hopefully green!), one per branch: tripleo-ui-propose-translation-update and tripleo-ui-propose-translation-update-ocata. The job should run every day even if there are no updates - it simply means there might not be a git review proposed at the end.

We haven't had issues with the periodic job so far, though the debugging process would be the same: figure out based on the failure if it is happening at the infra script stage or in one of our commands (e.g. npm run po2json), try to reproduce and fix. I'm sure super-helpful AJaeger would also let us know if he were to notice an issue here.

Automated patches

You may have seen the automated translations updates pop up on Gerrit. The commit message has some tips on how to review these: basically don't agonise over the translation contents as problems there should be handled in Zanata anyway, just make sure the format looks good and is unlikely to break the code. A JSON validation tool runs during the infra prep step in order to "prettify" the JSON blob and limit the size of the diffs, therefore once the patch  makes it out to Gerrit we know the JSON is well-formed at least.

Try to review these patches quickly to respect the translators' work. Not very nice to spend a lot of time on translating a project and yet not have your work included because no one was bothered to merge it :)

A note about new languages...

If the automated patch adds a new language, there'll be an additional step required after merging the translations in order to enable it: adding a string with the language name to a constants file. Until recently, this took 3 or 4 steps - thanks to Honza for making it much simpler!

This concludes the technical journey of a string. If you'd like to help with i18n tasks, we have a few related bugs open. They go from very simple low-hanging-fruits you could use to make your first contribution to the UI, to weird buttons that have translations available yet show in English but only in certain modals, to the kind of CI resiliency tasks I linked to earlier. Something for everyone! ;)

Working with the I18n team

It's really all about communication. Starting with...

Release schedule and string freezes

String freezes are noted on the main schedule but tend to fit the regular cycle-with-milestones work. This is a problem for a cycle-trailing project like tripleo-ui as we could be implementing features up to 2 weeks after the other projects, so we can't freeze strings that early.

There were discussions at the Atlanta PTG around whether the I18n should care at all about projects that don't respect the freeze deadlines. That would have made it impossible for projects like ours to ever make it onto the I18n official radar. The compromise was that cycle-trailing project should have a I18n cross-project liaison that communicates with the I18n PTL and team to inform them of deadlines, and also to ignore Soft Freeze and only do a Hard Freeze.

This will all be documented under an i18n governance tag; while waiting for it the notes from the sessions are available for the curious!

What's a String Freeze again?

The two are defined on the schedule: soft freeze means not allowing changes to strings, as it invalidates the translator's work and forces them to retranslate; hard freeze means no additions, changes or anything else in order to give translators a chance to catch up.

When we looked at Zanata earlier, there were translation percentages beside each language: the goal is always the satisfaction of reaching 100%. If we keep adding new strings then the goalpost keeps moving, which is discouraging and unfair.

Of course there's also an "exception process" when needed, to ask for permission to merge a string change with an explanation or at least a heads-up, by sending an email to the openstack-i18n mailing list. Not to be abused :)

Role of the I18n liaison

...Liaise?! Haha. The role is defined briefly on the Cross-Projects Liaison wiki page. It's much more important toward the end of the cycle, when the codebase starts to stabilise, there are fewer changes and translators look at starting their work to be included in the release.

In general it's good to hang out on the #openstack-i18n IRC channel (very low traffic), attend the weekly meeting (it alternates times), be available to answer questions, and keep the PTL informed of the I18n status of the project. In the case of cycle-trailing projects (quite a new release model still), it's also important to be around to explain the deadlines.

A couple of examples having an active liaison helps with:

  • Toward the end or after the release, once translations into the stable branch have settled, the stable translations get copied into the master branch on Zanata. The strings should still be fairly similar at that point and it avoids translators having to re-do the work. It's a manual process, so you need to let the I18n PTL know when there are no longer changes to stable/*.
  • Last cycle, because the cycle-trailing status of tripleo-ui was not correctly documented, a Zanata upgrade was planned right after the main release - which for us ended up being right when the codebase had stabilised enough and several translators had planned to be most active. Would have been solved with better, earlier communication :)

Post-release

After the Ocata release, I sent a few screenshots of tripleo-ui to the i18n list so translators could see the result of their work. I don't know if anybody cared :-) But unlike Horizon, which has an informal test system available for translators to check their strings during the RC period, most of the people who volunteered translations had no idea what the UI looked like. It'd be cool if we could offer a test system with regular string updates next release - maybe just an undercloud on the new RDO cloud? Deployment success/failures strings wouldn't be verifiable but the rest would, while the system would be easier to maintain than a full dev TripleO environment - better than nothing. Perhaps an idea for the Queens cycle!

The I18n team has a priority board on the Zanata main page (only visible when logged in I think). I'm grateful to see TripleO UI in there! :) Realistically we'll never move past Low or perhaps Medium priority which is fair, as TripleO doesn't have the same kind of reach or visibility that Horizon or the installation guides do. I'm happy that we're included! The OpenStack I18n team is probably the most volunteer-driven team in OpenStack. Let's be kind, respect string freezes and translators' time! \o/

</braindump>

Tagged with: open-source, openstack, talk-transcript, tripleo

by jpichon at July 07, 2017 12:45 PM

StackHPC Team Blog

Our loss is Norway's gain: StackHPC is recruiting

A new life in Norway finally beckons for Steve and Linn. In his time at StackHPC, Steve has contributed hugely to our vision for game-changing HPC infrastructure monitoring.

Now we are looking for fresh talent to take up the mantle and lead our ongoing efforts. If you're interested in a role with us at StackHPC, find more information with our recruiter.

StackHPC team in the Castle

Room for you at our usual table in the Castle Inn, Cambridge?

Good luck Steve and Linn from all of us at StackHPC!

by Stig Telfer at July 07, 2017 11:40 AM

July 06, 2017

OpenStack Superuser

What’s new in K8s 1.7

The latest version of Kubernetes has been just released, and as usual, it brings us some new features. Ihor Dvoretskyi, the member of Kubernetes 1.7 release team,  gives a tour of the most notable features that just landed with this release.

Summer is a special time for the Kubernetes community. The first stable Kubernetes release landed in summer – and July is when the worldwide community celebrates the birthday of the project.

Kubernetes 1.7 is a notable milestone for the entire Kubernetes ecosystem. While the first 2017 release, Kubernetes 1.6, had the goal of enhancing the existing features,  for 1.7 we’re focused on delivering brand-new features that will bring the new functionalities to the product.

 

From the graph above, you’ll notice that the total number of features in both releases is almost equal (29 compared to 28); while the number of “alpha” features is more than twice as many (eight compared to 18).

Does it mean that Kubernetes 1.7 is less stable than 1.6, because of the amount of alpha features?

No!

The new Kubernetes release brings us new the functionalities, but that doesn’t affect the existing stable components. There are numerous new features that have been just developed and are currently in non-production status, but they are describing the path and trends of how and where Kubernetes, as a solid product, moves forward. And of course, we are expecting that features labelled as alpha today will be promoted to “beta” and “stable” in the next few releases.

So, what are the new features that  you can try out today with Kubernetes 1.7?

Security enhancements

  • Encrypting secrets in etcd – that defends from the unexpected access to etcd API, etcd backups etc.
  • NetworkPolicy is now GA – with this feature, users can create various NetworkPolicy objects which select groups of pods and define how those pods should be allowed to communicate with each other.

Stateful workload enhancements

  • Local Storage Management, that enables ephemeral and durable access to the local storage;
  • StatefulSet updates – now this feature is in beta and it allows to update the resource limits or requested resources, container images, environment variables, container entry point commands or parameters, or configuration files, etc.

Runtime enhancements

  • Containerd-CRI is now in alpha. Containerd, as an open-source project under CNCF governance, originally developed by Docker, has some valuable benefits while using it as Kubernetes runtime, compared to Docker as a runtime. Containerd consumes less resources than Docker – it’s a subset of Docker and does not bring any resources overhead.

Federation enhancements

  • Policy-based Federated Resource Placement now in Alpha. This feature enables placement policies for the federated clusters and defines placement policies, based on company conventions, external regulation, pricing and performance requirements, etc.

You can find more detailed information about what Kubernetes 1.7 brings  on the Kubernetes blog. If you are interested in contributing to the OpenStack area at Kubernetes – feel free to join us at OpenStack Special Interest Group.

I’ll also be talking about what’s new online at the Kubernauts Worldwide Meetup July 12.

Dvoretskyi is program manager at Mirantis, focused on upstream Kubernetes-related efforts. He also acts as a product manager at the Kubernetes community and been responsible for the features track at Kubernetes 1.6 and 1.7 release teams.

Superuser is always interested in community content. Email editorATopenstack.org for details.

Cover Photo // CC BY NC

The post What’s new in K8s 1.7 appeared first on OpenStack Superuser.

by Ihor Dvoretskyi at July 06, 2017 02:56 PM

Mirantis

Are You Certifiable? Why Cloud Technology Certifications Matter

We talked to Mariela Gagnon of Cre8Hires and consultant Jens Soldners about one of your greatest advantages: technical certifications.

by Nick Chase at July 06, 2017 02:12 AM

July 05, 2017

Corey Bryant

OpenStack in a Snap

OpenStack is complex and many OpenStack community members are working hard to make the deployment and operation of OpenStack easier. Much of this time is focused on tools such as Ansible, Puppet, Kolla, Juju, Triple-O, Chef (to name a few). But what if we step down a level and also make the package experience easier?

With snaps we’re working on doing just that. Snaps are a new way of delivering software. The following description from snapcraft.io provides a good summary of the core benefits of snaps:

“Snaps are quick to install, easy to create, safe to run, and they update automatically and transactionally so your app is always fresh and never broken.”

Bundled software

A single snap can deliver multiple pieces of software from different sources to provide a solution that gets you up and running fast.  You’ll notice that installing a snap is quick. That’s because when you install a snap, that single snap bundles all of its dependencies.  That’s a bit different from installing a deb, where all of the dependencies get pulled down and installed separately.

Snaps are easy to create

In my time working on Ubuntu, I’ve spent much of it working on Debian packaging for OpenStack. It’s a niche skill that takes quite a bit of time to understand the nuances of.  When compared with snaps, the difference in complexity between deb packages and snaps is like night and day. Snaps are just plain simple to work on, and even quite fun!

A few more features of snaps

  • Each snap is installed in it’s own read-only squashfs filesystem.
  • Each snap is run in a strict environment sandboxed by AppArmor and seccomp policy.
  • Snaps are transactional. New versions of a snap install to a new read-only squashfs filesystem. If an upgrade fails, it will roll-back to the old version.
  • Snaps will auto-refresh when new versions are available.
  • OpenStack Snaps are guaranteed to be aligned with OpenStack’s upper-constraints. Packagers no longer need to maintain separate packages for the OpenStack dependency chain. Woo-hoo!

Introducing the OpenStack Snaps!

We currently have the following projects snapped:

  • Keystone – This snap provides the OpenStack identity service.
  • Glance – This snap provides the OpenStack image service.
  • Neutron – This snap specifically provides the ‘neutron-server’ process as part of a snap based OpenStack deployment.
  • Nova – This snap provides the Nova controller component of an OpenStack deployment.
  • Nova-hypervisor – This snap provides the hypervisor component of an OpenStack deployment, configured to use Libvirt/KVM + Open vSwitch which are installed using deb packages.  This snap also includes nova-lxd, allowing for use of nova-lxd instead of KVM.

This is enough to get a minimal working OpenStack cloud.  You can find the source for all of the OpenStack snaps on github.  For more details on the OpenStack snaps please refer to the individual READMEs in the upstream repositories. There you can find more details for managing the snaps, such as overriding default configs, restarting services, setting up aliases, and more.

Want to create your own OpenStack snap?

Check out the snap cookie cutter.

I’ll be writing a blog post soon that walks you through using the snap cookie cutter. It’s really simple and will help get the creation of a new OpenStack snap bootstrapped in no time.

Testing the OpenStack snaps

We’ve been using a simple script for initial testing of the OpenStack snaps. The script installs the snaps on a single node and provides additional post-install configuration for services. To try it out:

git clone https://github.com/openstack-snaps/snap-test
cd snap-test
./snap-deploy

At this point we’ve been doing all of our testing on Ubuntu Xenial (16.04).  Also note that this will install and configure quite a bit of software on your system so you’ll likely want to run it on a disposable machine.

Tracking OpenStack

Today you can install snaps from the edge channel of the snap store. For example:

sudo snap install --edge keystone

The OpenStack team is working toward getting CI/CD in place to enable publishing snaps across tracks for OpenStack releases (Ie. a track for ocata, another track for pike, etc). Within each track will be 4 different channels. The edge channel for each track will contain the tip of the OpenStack project’s corresponding branch, with the beta, candidate and release channels being reserved for released versions.  This should result in an experience such as:

sudo snap install --channel=ocata/stable keystone
sudo snap install --channel=pike/edge keystone

Poking around

Snaps have various environment variables available to them that simplify the creation of the snap. They’re all documented here.  You probably won’t need to know much about them to be honest, however there are a few locations that you’ll want to be familiar with once you’ve installed a snap:

$SNAP == /snap/<snap-name>/current

This is where the snap and all of it’s files are mounted. Everything here is read-only. In my current install of keystone, $SNAP is /snap/keystone/91. Fortunately you don’t need to know the current version number as there’s a symlink to that directory at /snap/keystone/current.

$ ls /snap/keystone/current/
bin                     etc      pysqlite2-doc        usr
command-manage.wrapper  include  snap                 var
command-nginx.wrapper   lib      snap-openstack.yaml
command-uwsgi.wrapper   meta     templates

$ ls /snap/keystone/current/bin/
alembic                oslo-messaging-send-notification
convert-json           oslo-messaging-zmq-broker
jsonschema             oslo-messaging-zmq-proxy
keystone-manage        oslopolicy-checker
keystone-wsgi-admin    oslopolicy-list-redundant
keystone-wsgi-public   oslopolicy-policy-generator
lockutils-wrapper      oslopolicy-sample-generator
make_metadata.py       osprofiler
mako-render            parse_xsd2.py
mdexport.py            pbr
merge_metadata.py      pybabel
migrate                snap-openstack
migrate-repository     sqlformat
netaddr                uwsgi
oslo-config-generator

$ ls /snap/keystone/current/usr/bin/
2to3               idle     pycompile     python2.7-config
2to3-2.7           pdb      pydoc         python2-config
cautious-launcher  pdb2.7   pydoc2.7      python-config
compose            pip      pygettext     pyversions
dh_python2         pip2     pygettext2.7  run-mailcap
easy_install       pip2.7   python        see
easy_install-2.7   print    python2       smtpd.py
edit               pyclean  python2.7

$ ls /snap/keystone/current/lib/python2.7/site-packages/
...

$SNAP_COMMON == /var/snap/<snap-name>/common

This directory is used for system data that is common across revisions of a snap. This is where you’ll override default config files and access log files.

$ ls /var/snap/keystone/common/
etc  fernet-keys  lib  lock  log  run

$ sudo ls /var/snap/keystone/common/etc/
keystone  nginx  uwsgi

$ ls /var/snap/keystone/common/log/
keystone.log  nginx-access.log  nginx-error.log  uwsgi.log

Strict confinement

The snaps all run under strict confinement, where each snap is run in a restricted environment that is sandboxed with seccomp and AppArmor policy.  More details on snap confinement can be viewed here.

New features/updates coming for snaps

There are a few features and updates coming for snaps that I’m looking forward to:

  • We’re working on getting libvirt AppArmor policy in place so that the nova-hypervisor snap can access qcow2 backing files.
    • For now, as a work-around, you can put virt-aa-helper in complain mode: sudo aa-complain /usr/lib/libvirt/virt-aa-helper
  • We’re also working on getting additional snapd interface policy in place that will enable network connectivity for deployed instances.
    • For now you can install the nova-hypervisor snap in devmode, which disables security confinement:  snap install –devmode –edge nova-hypervisor
  • Auto-connecting nova-hypervisor interfaces. We’re working on getting the interfaces for the nova-hypervisor defined automatically at install time.
    • Interfaces define the AppArmor and seccomp policy that enables a snap to access resources on the system.
    • For now you can manually connect the required interfaces as described in the nova-hypervisor snap’s README.
  • Auto-alias support for commands. We’re working on getting auto-alias support defined for commands across the snaps, where aliases will be defined automatically at install time.
    • This enables use of the traditional command names. Instead of ‘nova.manage db sync‘ you’ll be able to issue ‘nova-manage db sync’ right after installing the snap.
    • For now you can manually enable aliases after the snap is installed, such as ‘snap alias nova.manage nova-manage’. See the snap READMEs for more details.
  • Auto-alias support for daemons.  Currently snappy only supports aliases for commands (not daemons).  Once alias support is available for daemons, we’ll set them up to be automatically configured at install time.
    • This enables use of the traditional unit file names. Instead of ‘systemctl restart snap.nova.nova-compute’ you’ll be able to issue ‘systemctl restart nova-compute’.
  • Asset tracking for snaps. This will enables tracking of versions used to build the snap which can be re-used in future builds.

If you’d like to chat more about snaps you can find us on IRC in #openstack-snaps on freenode. We welcome your feedback and contributions!

Thanks and have fun!

Corey


by coreycb at July 05, 2017 06:37 PM

OpenStack Superuser

Laying the groundwork for an OpenStack Contributor portal

Every month we have people asking on IRC or the dev mailing list who are interested in working on OpenStack. Sometimes they’re given different answers from people, or worse, no answer at all.

Suggestion: Let’s pool our efforts together to create some common documentation so that all teams in OpenStack can benefit.

First, it’s important to note that we’re not just talking about code projects here. OpenStack contributions come in many forms such as running meet-ups, identifying use cases (Product Working Group), documentation, testing, etc. We want to make sure those potential contributors feel welcomed too!

What is common documentation? Things like setting up Git, the many accounts you need to setup to contribute (Gerrit, Launchpad, an OpenStack Foundation account). Not all teams will use some common documentation, but the point is one or more projects will use them. Having the common documentation worked on by various projects will better help prevent duplicated efforts, inconsistent documentation, and hopefully just more accurate information. A team might use special tools to do their work. These can also be integrated in this idea as well.

Once we have common documentation we can have something like:

1. Choose your own adventure: I want to contribute by code

2. What service type are you interested in? (Database, block storage, compute)

3. Here’s step-by-step common documentation to setting up Git, IRC, Mailing Lists, accounts, etc.

4. A service-type project might choose to also include additional documentation in that flow for special tools, etc.

Important things to note in this flow:

* How do you want to contribute?

* Here are **clear** names that identify the team. Not code names like Cloud Kitty, Cinder, etc.

* The documentation should really aim to not be daunting:

* Someone should be able to glance at it and feel like they can finish things in five minutes. Not be yet another tab left in their browser that they’ll eventually forget about.

* No wall of text!

* Use screen shots

* Avoid covering every issue you could hit along the way.

A mock-up of the contributor portal. More options below.

Examples of More Simple Documentation

I worked on some documentation for the Upstream University preparation that has received excellent feedback meet close to these suggestions: * IRC [1] * Git [2] * Account Setup [3]

500-foot birds-eye view

There will be a Contributor landing page on the openstack.org website. Existing contributors will find reference links to quickly jump to things. New contributors will find a banner at the top of the page to direct them to the choose your own adventure to contributing to OpenStack, with ordered documentation flow that reuses existing documentation when necessary. Picture also a progress bar somewhere to show how close you are to being ready to contribute to whatever team. Of course there are a lot of other fancy things we can come up with, but I think getting something up as an initial pass would be better than what we have today.

Here’s an example of what the sections/chapters could look like:

Code

* Volumes (Cinder)

* IRC

* Git

* Account Setup

* Generating Configs

* Compute (Nova)

* IRC

– Git

* Account Setup

* Something about hypervisors (matrix?) –

Use Cases

* Products (Product Working Group)

* IRC

* Git

* Use Case format

There are some rough mock up ideas [4]. Probably Sphinx will be fine for this. Potentially we could use this content for conference lunch and learns, upstream university, and the on-boarding events at the Forum.

What do you all think?

To get involved, you can reach out to Perez on IRC or Twitter, where his handle is Thingee or over email: mikeATopenstack.org

[1] – http://docs.openstack.org/upstream-training/irc.html

[2] – http://docs.openstack.org/upstream-training/git.html

[3] – http://docs.openstack.org/upstream-training/accounts.html

[4] – https://www.dropbox.com/s/o46xh1cp0sv0045/OpenStack%20contributor%20portal.pdf?dl=0

Mike Perez is the cross-project developer coordinator at the OpenStack Foundation.

Cover Photo // CC BY NC

The post Laying the groundwork for an OpenStack Contributor portal appeared first on OpenStack Superuser.

by Mike Perez at July 05, 2017 03:05 PM

Alessandro Pilotti

Hyper-V RemoteFX in OpenStack

We’ve added support for RemoteFX for Windows / Hyper-V Server 2012 R2 back in Kilo, but the highly anticipated Windows / Hyper-V Server 2016 comes with some new nifty features for which we’re excited about!

In case you are not familiar with this feature, it allows you to virtualize your GPUs and share them across virtual machine instances by adding virtual graphics devices. This leads to a richer RDP experience especially for VDI on OpenStack, as well as the benefit of having a GPU on your instances, enhancing GPU-intensive applications (CUDA, OpenCL, etc).

If you are curious, you can take a look at one of our little experiments. We’ve run a few GPU-intensive demos on identical guests with and without RemoteFX. The difference was very obvious between the two. You can see the recording here.

One of the most interesting feature RemoteFX brings in terms of improving the user experience, is device redirection. This allows you to connect your local devices (USBs, smart cards, VoIP devices, webcams, etc.) to RemoteFX enabled VMs through your RDP client. For a detailed list of devices you can redirect through your RDP session can be found here.

Some of the new features for RemoteFX in Windows / Hyper-V Server 2016 are:

  • 4K resolution option
  • 1GB dedicated VRAM (availble choices: 64MB, 128MB, 256MB, 512MB, 1GB) and up to another 1GB shared VRAM
  • Support for Generation 2 VMs
  • OpenGL and OpenCL API support
  • H.264/AVC codec investment
  • Improved performance

One important thing worth mentioning is the fact that RemoteFX allows you to overcommit your GPUs, the same way you can overcommit disk, memory, or vCPUs!

All of this sounds good, but how can you know if you can enable RemoteFX? All you need for this is a compatible GPU that passes the minimum requirements:

  • it must support DirectX 11.0 or newer
  • it must support WDDM 1.2 or newer,
  • Hyper-V feature must installed.

If you pass these simple requirements, all you have to do to enable the feature is to run this PowerShell command:

Install-WindowsFeature RDS-Virtualization

 

Hyper-V has to be configured to use RemoteFX. This can be done by opening the Hyper-V Manager, opening up Hyper-V Settings, and under Physical GPUs, check the Use this GPU with RemoteFX checkbox.

For more information about RemoteFX requirements and recommended RemoteFX-compatible GPUs, read this blog post.

In order to take advantage of all these features, the RDP client must be RemoteFX-enabled (Remote Desktop Connection 7.1 or newer).

Please do note that the instance’s guest OS must support RemoteFX as well. Incompatible guests will not be able to fully benefit from this feature. For example, Windows 10 Home guests are not compatible with RemoteFX, while Windows 10 Enterprise and Pro guests are. This fact can easily be checked by looking up the Video Graphics Adapter in the guest’s Device Manager.

 

RemoteFX inside a guest VM

 

After the RDS-Virtualization feature has been enabled, the nova-compute service running on the Hyper-V compute node will have to be configured as well. The following config option must be set to True in nova-compute‘s nova.conf file:

[hyperv]
enable_remotefx = True

 

In order to spawn an instance with RemoteFX enabled via OpenStack, all you have to do is provide the instance with a few flavor extra_specs:

  • os_resolution:  guest VM screen resolution size.
  • os_monitors:  guest VM number of monitors.
  • os_vram:  guest VM VRAM amount. Only available on Windows / Hyper-V Server 2016.

There are a few things to take into account:

  1. Only a subset of resolution sizes are available for RemoteFX. Any other given resolution size will be met with an error.
  2. The maximum number of monitors allowed is dependent on the requested resolution. Requesting a larger number of monitors than the maximum allowed per requested resolution size will be met with an error.
  3. Only the following VRAM amounts can be requested: 64, 128, 256, 512, 1024.
  4. On Windows / Hyper-V Server 2012 R2, RemoteFX can only be enabled on Generation 1 VMs.

The available resolution sizes and maximum number of monitors are:
For Windows / Hyper-V Server 2012 R2:

1024x768:   4
1280x1024:  4
1600x1200:  3
1920x1200:  2
2560x1600:  1

For Windows / Hyper-V Server 2016:

1024x768: 8
1280x1024: 8
1600x1200: 4
1920x1200: 4
2560x1600: 2
3840x2160: 1

Here is an example of a valid flavor for RemoteFX:

# nova flavor-create <name> <id> <ram> <disk> <vcpus>
nova flavor-create m1.remotefx 999 4096 40 2
nova flavor-key m1.remotefx set os:resolution=1920x1200
nova flavor-key m1.remotefx set os:monitors=1
nova flavor-key m1.remotefx set os:vram=1024

 

We hope you enjoy this feature as much as we do! What would you use RemoteFX for?

The post Hyper-V RemoteFX in OpenStack appeared first on Cloudbase Solutions.

by Claudiu Belu at July 05, 2017 02:16 PM

Mirantis

What can NFV do for a business?

Telcos, multiple-system operators (MSOs i.e. cable & satellite providers), and network providers are under pressure on several fronts.

by Amar Kapadia at July 05, 2017 12:57 PM

July 04, 2017

SUSE Conversations

SUSE Rocked the Stage at LC3 2017 in Beijing

The following article has been contributed by Grace Wang, QA Engineer, SUSE China. LinuxCon + ContainerCon + CloudOpen (LC3) was held on June 19-20, 2017 at the China National Convention Center in Beijing. LinuxCon was already successfully held in North America, Europe and Japan during the past few years. Now it finally came to China. During …

+read more

The post SUSE Rocked the Stage at LC3 2017 in Beijing appeared first on SUSE Blog. chabowski

by chabowski at July 04, 2017 11:08 AM

July 03, 2017

RDO

Recent blog posts, July 3

Here's what the community is blogging about lately.

OVS-DPDK Parameters: Dealing with multi-NUMA by Kevin Traynor

In Network Function Virtualization, there is a need to scale functions (VNFs) and infrastructure (NFVi) across multiple NUMA nodes in order to maximize resource usage.

Read more at https://developers.redhat.com/blog/2017/06/28/ovs-dpdk-parameters-dealing-with-multi-numa/

OpenStack Down Under – OpenStack Days Australia 2017 by August Simonelli, Technical Marketing Manager, Cloud

As OpenStack continues to grow and thrive around the world the OpenStack Foundation continues to bring OpenStack events to all corners of the globe. From community run meetups to more high-profile events like the larger Summits there is probably an OpenStack event going on somewhere near you.

Read more at http://redhatstackblog.redhat.com/2017/06/26/openstack-down-under-openstack-days-australia-2017/

OpenStack versions - Upstream/Downstream by Carlos Camacho

I’m adding this note as I’m prone to forget how upstream and downstream versions are matching.

Read more at http://anstack.github.io/blog/2017/06/27/openstack-versions-upstream-downstream.html

Tom Barron - OpenStack Manila - OpenStack PTG by Rich Bowen

Tom Barron talks about the work on Manila in the Ocata release, at the OpenStack PTG in Atlanta.

Read more at http://rdoproject.org/blog/2017/07/tom-barron-openstack-manila-openstack-ptg/

Victoria Martinez de la Cruz: OpenStack Manila by Rich Bowen

Victoria Martinez de la Cruz talks Manila, Outreachy, at the OpenStack PTG in Atlanta

Read more at http://rdoproject.org/blog/2017/06/victoria-martinez-de-la-cruz-openstack-manila/

Ihar Hrachyshka - What's new in OpenStack Neutron for Ocata by Rich Bowen

Ihar Hrachyshka talks about his work on Neutron in Ocata, and what's coming in Pike.

Read more at http://rdoproject.org/blog/2017/06/ihar-hrachyshka-whats-new-in-openstack-neutron-for-ocata/

Introducing Software Factory - part 1 by Software Factory Team

Introducing Software Factory Software Factory is an open source, software development forge with an emphasis on collaboration and ensuring code quality through Continuous Integration (CI). It is inspired by OpenStack's development workflow that has proven to be reliable for fast-changing, interdependent projects driven by large communities.

Read more at http://rdoproject.org/blog/2017/06/introducing-Software-Factory-part-1/

Back to Boston! A recap of the 2017 OpenStack Summit by August Simonelli, Technical Marketing Manager, Cloud

This year the OpenStack ® Summit returned to Boston, Massachusetts. The Summit was held the week after the annual Red Hat ® Summit, which was also held in Boston. The combination of the two events, back to back, made for an intense, exciting and extremely busy few weeks.

Read more at http://redhatstackblog.redhat.com/2017/06/19/back-to-boston-a-recap-of-the-2017-openstack-summit/

by Rich Bowen at July 03, 2017 06:22 PM

Improving the RDO Trunk infrastructure, take 2

One year ago, we discussed the improvements made to the RDO Trunk infrastructure in this post. As expected, our needs have changed in this year, and so has to change our infrastructure. So here we are, ready to describe what's new in RDO Trunk.

New needs

We have some new needs to cover:

  • A new DLRN API has been introduced, meant to be used by our CI jobs. The main goal behind this API is to break the current long, hardcoded Jenkins pipelines we use to promote repositories, and have individual jobs "vote" on each repository instead, with some additional logic to decide which repository needs to be promoted. The API is a simple REST one, defined here.

  • This new API needs to be accessible for jobs running inside and outside the ci.centos.org infrastructure, which means we can no longer use a local SQLite3 database for each builder.

  • We now have an RDO Cloud available to use, so we can consolidate our systems there.

  • Additionally, hosting our CI-passed repositories in the CentOS CDN was not working as we expected, because we needed some additional flexibility that was just not possible there. For example, we could not remove a repository in case it was promoted by mistake.

Our new setup

This is the current design for the RDO Trunk infrastructure:

New RDO Trunk infrastructure

  • We still have the build server inside the ci.centos.org infrastructure, and not available from the outside. This has proven to be a good solution, since we are separating content generation from content delivery.

  • https://trunk.rdoproject.org is now the URL to be used for all RDO Trunk users. It has worked very well so far, providing enough bandwidth for our needs.

  • The database has been taken out to an external MariaDB server, running on the RDO Cloud (dlrn-db.rdoproject.org). This database is set up as master-slave, with the slave running on an offsite cloud instance that also servers as a backup machine for other services. This required a patch to DLRN to add MariaDB support.

Future steps

Experience tells us that this setup will not stay like this forever, so we already have some plans for future improvements:

  • The build server will migrate to the RDO Cloud soon. Since we are no longer mirroring our CI-passed repositories on the CentOS CDN, it makes more sense to manage it inside the RDO infrastructure.

  • Our next step will be to make RDO Trunk scale horizontally, as described here. We want to use our nodepool VMs in review.rdoproject.org to build packages after each upstream commit is merged, then use the builder instance as an aggregator. That way, the hardware needs for this instance become much lower, since it just has to fetch the generated RPMs and create new repositories. Support for this feature is already in DLRN, so we just need to figure out how to do the rest.

by jpena at July 03, 2017 06:22 PM

OpenStack Superuser

Travel Support program brings key OpenStack contributors to the Summit

Key contributors from 18 countries attended the recent OpenStack Summit in Boston. Find out how the Travel Support Program could be your ticket to Australia.

Some summit participants have to travel great distances to attend, but may not always have the resources or support to do so. The OpenStack Foundation helps participants reach their attendance goal via the Travel Support Program.

The Foundation support 40 people from 18 different countries to come participate in OpenStack Summit Boston.

The Travel Support Program is based on the premise of Open Design, a commitment to an open design process that welcomes the public, including users, developers and upstream projects. This year the program also included individual supporters who chose to donate frequent flyer miles or funds to assist the program’s efforts.

OpenStack Ambassador and support program recipient Lisa-Marie Namphy, said it brought tears to her eyes seeing the names of individual contributors who gave funds featured at the Boston Summit keynote. She says their generosity made her feel valued as a volunteer and important to the OpenStack community.
The summit is also a great opportunity for participants to network and have important discussions regarding OpenStack contributions.  A developer from India, Ranga Swami Reddy Muthumula made his first trip to the United States and said without the support he would not have been able to have key meetings with project technical leads (PTLs). By having those meetings, decisions were made that would otherwise take 14 months to secure. Similarly, Gene Kuo, co-organizer for the Taiwan user group, was able to meet other user group organizers and receive valuable advice, which he plans to apply in the future.

The deadline to apply for Travel Support to the Sydney Summit is August 22. Read these tips on how to apply for Travel support.

The post Travel Support program brings key OpenStack contributors to the Summit appeared first on OpenStack Superuser.

by Sonia Ramza at July 03, 2017 05:37 PM

NFVPE @ Red Hat

Build and use security hardened images with TripleO

Starting to apply since Pike Concept of security hardened images Normally the images used for overcloud deployment in TripleO are not security hardened. It means, the images lack all the extra security measures needed to accomplish with ANSSI requirements. These extra measures are needed to deploy TripleO in environments where security is an important feature.The following recommendations are given to accomplish with security guidelines: ensure that /tmp is mounted on a separate volume or partition, and that it is mounted with rw,nosuid,nodev,noexec,relatime flags ensure that /var, /var/log and /var/log/audit are mounted on separates volumes or partitions, and that are mounted…

by Yolanda Robla Mota at July 03, 2017 03:54 PM

June 30, 2017

OpenStack Superuser

Cross-community collaboration: strengthening ties with OPNFV

Collaboration is crucial in the land of open source, and with the rapid growth of projects and communities, cross-community activities are more important than ever. As an example of OpenStack’s dedication to cross community collaboration, we are working closely with the OPNFV community to help accelerate the adoption and innovation of Network Function Virtualization (NFV).

Open Source Days in Boston and Sydney

The OpenStack Summit Boston featured a new addition: Open Source Days. A variety of open source communities were invited to join the Summit from various areas like containers and networking to accelerate collaboration, share their project roadmaps and support innovation. Among them were Kubernetes, FD.io, OPNFV, and Cloudify. OpenStack has been increasing its work with adjacent communities, and Open Source Days was a natural extension of that work. We plan to host the Open Source Days again at the Sydney Summit, November 6-8, 2017.

OPNFV Summit

We brought a version of the OpenStack Upstream Institute to the recent OPNFV Summit in Beijing, China. We tailored the training to fit into the event structure and also to address the needs of OPNFV developers working on OpenStack upstream.

By playing the roles of VIM and NFVi, OpenStack is one of the main building blocks of the OPNFV reference platform, which was reflected in OpenStack’s presence during the conference in presentations and Design Summit sessions. We discussed ongoing design and development activities in areas like integration and interoperability testing, cross-community CI, edge computing, resource reservation or standardization.

By the close collaboration between these two communities we are taking the steps to address more and more specific telecom and NFV-related gaps and needs in OpenStack.

“OPNFV is all about cross-community collaboration and we look for opportunities to collaborate with the OpenStack community whenever possible,” says Brandon Wick, head of OPNFV Marketing, The Linux Foundation.  “OPNFV held a successful OPNFV Day during the recent OpenStack Summit in Boston and it was great that OpenStack chose to hold the OpenStack Upstream Institute training at the OPNFV Summit in Beijing. Together, our projects provide a comprehensive, innovative, and constantly evolving set of NFV capabilities for end users and we welcome new developers to join us on our path towards open source NFV.

ildiko Vancsa speaking at the recent OPNFV Summit.

Future plans

This is just the beginning with so much more on the horizon. In order to continue the evolution of technology, the entire industry needs to play and work together. The open source environment is the most suitable to support this process and to provide opportunities for companies and individuals to start working on  solutions to tomorrow’s challenges today.

You can also follow the joint OpenStack and OPNFV activities on the wiki pages and version control systems of both communities, and we hope to see you at the OPNFV Open Source Days sessions at the OpenStack Summit Sydney.

The post Cross-community collaboration: strengthening ties with OPNFV appeared first on OpenStack Superuser.

by Kendall Nelson and ildiko Vancsa at June 30, 2017 04:41 PM

June 29, 2017

NFVPE @ Red Hat

Look ma, No Docker! Kubernetes with CRI-O, and no Docker at all!

This isn’t just a stunt like riding a bike with no hands – it’s probably the future of how we’ll use Kubernetes. Today, we’re going to spin up Kubernetes using cri-o which uses the Kubernetes container runtime interface with OCI (open containers initive) compatible runtimes. That’s a mouthful, but, the gist is – it’s a way to use Kubernetes without Docker! That’s what we’ll do today. And to add a cherry on top, we’re also going to build a container image without Docker, too. We won’t go in depth on images today – our goal will be to get a Kubernetes up without Docker, with cri-o, and we’ll run a pod on it to prove it out.

by Doug Smith at June 29, 2017 01:05 PM

OpenStack Superuser

OpenStack Cramming, Part I

 It can be a bit intimidating when you’re new to OpenStack. Here are a few tidbits to help kick-start your journey.

What is OpenStack?

Think of OpenStack as an open-source, cloud infrastructure management system used to automate and control large pools of compute, storage, and data center resources. Various flavors of OpenStack are also available, including enterprise-grade options that offer services such as support, upgrades, and APIs for vendor-specific ecosystems.

The OpenStack Project itself is a collaborative development initiative whose goal is to produce the open standard cloud platform for public and private clouds. A collective group of global developers and technologists collaborate on OpenStack projects throughout the year and meet every every six months at OpenStack summits where they share lessons learned, gather requirements, discuss features, and collaborate for upcoming releases.

OpenStack Summits

The most recent OpenStack summit was held in Boston in May, and the next will be in Sydney on November 6-8. Over 5,000 attendees representing about a thousand companies from 63 countries attended the Boston summit. So yeah, be sure to tell your boss that all the cool kids go when you make your case for travel. Details of the Sydney summit are available at https://www.openstack.org/summit/sydney-2017/.

Over 700 sessions were offered at the Boston summit. Unless you’re a time traveler, there’s no way to attend them all. Not all sessions are recorded, but the OpenStack Foundation does make a good number of them available online. Boston’s video repository can be found at: https://www.openstack.org/videos/summits/boston-2017.

OpenStack initiatives

The OpenStack Foundation promotes the advancement of OpenStack, serving the OpenStack community (developers, users, service providers, vendors, etc.) and providing shared resources for the entire ecosystem organized around three major aspects of virtualization.

The compute service, Nova, involves provisioning and managing large networks of virtual machines (VMs).

 

The storage service, Swift, involves (object or data) storage, while Cinder involves block storage for servers and applications.

 

The networking service, Neutron, is an software-defined-networking project focusing on delivering networking-as-a-service (NaaS).

 

 

 

 

There are actually more than 60 OpenStack Projects under development, and each has been assigned its own project mascot. You can see them all at https://www.openstack.org/project-mascots.

Random OpenStack facts: Releases

OpenStack releases typically occur every six months. The latest release, Ocata, broke the mold with a four-month development cycle. And while Ocata was released only this past February, the forthcoming releases have already been named. Pike is on deck, to be followed by Queens.

Release names are also associated with nearby cities, counties, or distinguishing regional features of the summit corresponding each OpenStack release. For example, the Ocata release is named after a beach in El Masnou, Spain, just a short drive from where the 2016 summit was held. It’s also rated #1 on TripAdvisor among 13 attractions in that area.

There are other special rules associated with release names that mandate creativity from the OpenStack Foundation members. In addition to the alphabetical order requirement, the name must be a single word with a maximum of ten characters and must not describe the feature. “Ocata Beach,” for example, wouldn’t meet those requirements. For more information on the rules of release naming, visit https://wiki.openstack.org/wiki/Release_Naming

OpenStack adoption

Hundreds of the world’s largest brands run on OpenStack. Its adoption spans multiple industries. Automotive, finance, media, energy, and telecom are just a few examples. 50 percent of Fortune 100 companies have deployed it — and adoption is still growing.

OpenStack is particularity critical to the strategies of many Tier 1 service providers. In North America, AT&T’s Integrated Cloud (AIC) and Verizon’s universal CPE (or cloud-in-the-box) are both examples worth noting.

Stay tuned for OpenStack Cramming, Part II…

Beverly Ibarrola is director of strategic partnerships, SDN and NFV at Award Solutions. This post first appeared on LinkedIn.

Superuser is always interested in community content – get in touch at editorATopenstack.org.

The post OpenStack Cramming, Part I appeared first on OpenStack Superuser.

by Beverly Ibarrola at June 29, 2017 11:00 AM

Opensource.com

This is how you OpenStack: 6 new guides and tutorials

Want to learn more about the ins and outs of OpenStack? These free resources could be just what you need.

by Jason Baker at June 29, 2017 07:03 AM

June 28, 2017

Mirantis

Network slicing and 5G and wireless, oh my!

If you're not in the telecom business, you probably haven't given much thought to the upcoming 5G standard, except perhaps to wonder when your phone will have faster data.

by Nick Chase at June 28, 2017 08:26 PM

OpenStack Superuser

Catch these updates on OpenStack community goals, strategies

Superuser TV was on the ground at the OpenStack Summit Boston talking to community members about everything from edge computing to OPNFV.

A few highlights, in case you missed them:

You can check out all 26 interviews in a playlist on the OpenStack Foundation’s YouTube page.

The post Catch these updates on OpenStack community goals, strategies appeared first on OpenStack Superuser.

by Superuser at June 28, 2017 11:57 AM

June 27, 2017

Ben Nemec

TripleO Network Isolation Template Generator Update

Just a quick update on the TripleO Network Isolation Template Generator. A few new features have been added recently that may be of interest.

The first, and most broadly applicable, is that the tool can now generate either old-style os-apply-config based templates, or new-style tripleo-heat-templates native templates. The latter are an improvement because they allow for much better error handling, and if bugs are found it is much easier to fix them. If your deployment is using Ocata or newer TripleO then you'll want to use the version 2 templates. If you need to support older releases, select version 1.

In addition support for some additional object types has been added. In particularl, the tool can now generate templates for OVS DPDK deployments. I don't have any way to test these templates, unfortunately, so the output is solely based on the examples in the os-net-config repo. Hopefully those are accurate. :-)

If you do try any of the new (or old) features of the tool and have feedback don't hesitate to let me know. To my knowledge I'm still the primary user of the tool so it would be nice to know what, if anything, other people are doing with it.

by bnemec at June 27, 2017 08:52 PM

About

Planet OpenStack is a collection of thoughts from the developers and other key players of the OpenStack projects. If you are working on OpenStack technology you should add your OpenStack blog.

Subscriptions

Last updated:
July 22, 2017 12:38 PM
All times are UTC.

Powered by:
Planet