October 21, 2014

IBM OpenStack Team

IBM contributions to OpenStack go beyond the code

Co-authored by Manuel Silveyra

In just four years, OpenStack has become the largest and most active open source project—not just the hottest cloud technology. As of the October 16, 2014, Juno release, the overall number of contributors has surpassed 2,500 and there have been nearly 130,000 code commits. In 2014 alone, there’s been an average of 4,000 source code improvements per month.

As is the case with most open source projects, code contributions are the most high profile indicator of project vitality, as you can tell by the metrics we called out first. But there are other important activities around an open source project that also contribute to community health and user uptake.

Our colleague Brad Topol recently summarized the major advancements made by the community in the most recent OpenStack Juno release. He also highlighted the IBM specific additions, which fall into five major categories:

  • Enterprise security: Contributions to Keystone to enable better hybrid cloud integration and auditing
  • Block storage: Improvements to the resiliency and troubleshooting of Cinder storage volumes
  • User experience: Internationalization and usability upgrades for Horizon
  • Compute management: Improved automation and integration by simplifying the Nova application programming interfaces (APIs)
  • Interoperability: Leading work to ensure that OpenStack vendor implementations are compatible.

These technical contributions are great, but they are only one part of the overall support that IBM has provided for the OpenStack project. Like Linux, Apache and Eclipse before it, OpenStack benefits from IBM activities such as:

If you are attending the Summit in Paris next month, come to our session to learn about these and other IBM contributions to OpenStack. Our goal is to show that there are many ways for individuals and organizations to contribute to an open source project, beyond writing code, and we would like to encourage others to take part.

(Related: IBM technical sessions at the OpenStack Summit)

We also want to hear your suggestions on how IBM can better contribute to Kilo, the next major OpenStack release after Juno. Let us know at the Summit, or on Twitter @DanielKrook and @manuel_silveyra.

Manuel Silveyra is a Senior Cloud Solutions Architect working for the IBM Cloud Performance team in Austin. He is dedicated to bringing open cloud technologies such as OpenStack, Cloud Foundry, and Docker to enterprise clients.

The post IBM contributions to OpenStack go beyond the code appeared first on Thoughts on Cloud.

by Daniel Krook at October 21, 2014 06:28 PM

Solinea

Making the Case for OpenStack—Critical Success Factors (Part 2)

buildings-48796Last week I wrote about some of the challenges to successfully implementing OpenStack in the enterprise.  The biggest obstacles have nothing to do with technology, but rather have to do with Governance, Processes and Skills.



by Francesco Paola (fpaola@solinea.com) at October 21, 2014 02:00 PM

OpenStack in Production

Kerberos and Single Sign On with OpenStack

External Authentication with Keystone

One of the most commonly requested features by the CERN cloud user community is support for authentication using Kerberos on the command line and single-sign on with the OpenStack dashboard.

In our Windows and Linux environment, we run Active Directory to provide authentication services. During the Essex cycle of OpenStack, we added support for getting authentication based on Active Directory passwords. However, this had several drawbacks:
  • When using the command line clients, the users had the choice of storing their password in environment variables such as with the local openrc script or re-typing their password with each OpenStack command. Passwords in environment variables has significant security risks since they are passed to any sub-command and can be read by the system administrator of the server you are on.
  • When logging in with the web interface, the users were entering their password into the dashboard. Most of CERN's applications use a single sign on package with Active Directory Federation Services (ADFS). Recent problems such as Heartbleed show the risks of entering passwords into web applications.
The following describes how we configured this functionality.

Approach

With our upgrade to Icehouse completed last week, the new release of the v3 identity API, Keystone now supports several authentication mechanisms through plugins. By default password, token and external authentication were provided. In this scenario, other authentication methods such Kerberos or X.509 can be used with a proper apache configuration and the external plugin provided in keystone. Unfortunately, when enabling these methods on apache, there is no way to make them optional so the client can choose the most appropriate.

Also when checking the projects he can access, the client normally does two operations on keystone, one to retrieve the token, and the other one with the token to retrieve the project list. Even if it is specified in the environment variables, the second call always uses the catalog, so if in the catalog has version 2 and we are using version 3 then we have an exception while doing the API call.

Requirements

In this case we need a solution that allows us to use Kerberos, X.509 or another authentication mechanism in a transparent way and also backwards compatible, so we can offer both APIs and let the user choose which is the most appropriate for its workflow. This will allow us to migrate services from one API version to the next one with no downtime.

In order to allow external authentication to our clients, we need to cover two parts, client side and server side. Client side to distinguish which is the auth plugin to use, and Server side to allow multiple auth methods and API versions at once.

Server Solution

In order to have different entry points under the same api, we would need a load balancer, in this particular case we use HAproxy. From this load balancer we are calling two different sets of backend machines, one for version 2 of the API and the other for version 3. In this loadbalancer, we can analyze the version of the url where the client is connecting to so we can redirect him to the appropriate set. Each backend is running keystone under apachea and it is connected to the same database. We need this to allow tokens to be validated no matter the version is used on the client. The only difference between the backend sets is the catalog, the identity service is different on both pointing the client to the available version on each set. For this particular purpose we will use a templatedcatalog.


Right now we solve the multiversion issue of the OpenStack environment, but we didn't allow Kerberos or X.509. As these methods are not optional we may need different entry points for each authentication plugin used. So we need entry points for OpenStack authentication (password, token), Kerberos and X.509. There is no issue with the catalog if we enable these methods, all of them can be registered on the service catalog like normal OpenStack authentication, because any consequent call on the system will use token based authentication.
So in the apache v3 backend we have the following urls defined:

https://mykeystone/main/v3
https://mykeystone/admin/v3
https://mykeystone/krb/v3
https://mykeystone/x509/v3

If you post an authentication request to the Kerberos url, this will require a valid Kerberos token, in case it is not sent it will initiate a challenge. After validating it, it will it the user as the REMOTE_USER. In case of client certificate authentication, you will use the X.509 url that will require a valid certificate, in this case it will use the DN as the REMOTE_USER. After this variable is set, then Keystone can take over and check the user in the Keystone database.
There is a small caveat, we cannot do offloading of SSL client authentication on the HAproxy, so for this purpose we need to connect directly from the client, it uses a different port 8443 and connects directly to the backends configured. So for X.509 authentication we use 'https://mykeystone:8443/x509/v3'

Client Solution

For the client side, the plugin mechanism will only be available on the common cli (python-openstackclient) and not on the rest of the toolset (nova, glance, cinder, ...). There is no code yet that implements the plugin functionality, so in order to provide a short term implementation, and based on our current architecture, we can base it the selection of the plugin on the OS_AUTH_URL for the moment. The final upstream implementation will almost certainly differ at this point by using a parameter or discover the auth plugins available. In that case the client implementation may change but this is likely to be close to the initial implementation.

In openstackclient/common/clientmanager.py
...
        if 'krb' in auth_url and ver_prefix == 'v3':
            LOG.debug('Using kerberos auth %s', ver_prefix)
            self.auth = v3_auth_kerberos.Kerberos(
                auth_url=auth_url,
                trust_id=trust_id,
                domain_id=domain_id,
                domain_name=domain_name,
                project_id=project_id,
                project_name=project_name,
                project_domain_id=project_domain_id,
                project_domain_name=project_domain_name,
            )
        elif 'x509' in auth_url and ver_prefix == 'v3':
            LOG.debug('Using x509 auth %s', ver_prefix)
            self.auth = v3_auth_x509.X509(
                auth_url=auth_url,
                trust_id=trust_id,
                domain_id=domain_id,
                domain_name=domain_name,
                project_id=project_id,
                project_name=project_name,
               project_domain_id=project_domain_id,
                project_domain_name=project_domain_name,
                client_cert=client_cert,
            )
        elif self._url:
...

HAproxy configuration

global
  chroot  /var/lib/haproxy
  daemon
  group  haproxy
  log  mysyslogserver local0
  maxconn  8000
  pidfile  /var/run/haproxy.pid
  ssl-default-bind-ciphers  ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128:AES256:AES:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK
  stats  socket /var/lib/haproxy/stats
  tune.ssl.default-dh-param  2048
  user  haproxy

defaults
  log  global
  maxconn  8000
  mode  http
  option  redispatch
  option  http-server-close
  option  contstats
  retries  3
  stats  enable
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

frontend cloud_identity_api_production
  bind 188.184.148.158:443 ssl no-sslv3 crt /etc/haproxy/cert.pem verify none
  acl  v2_acl_admin url_beg /admin/v2
  acl  v2_acl_main url_beg /main/v2
  default_backend  cloud_identity_api_v3_production
  timeout  http-request 5m
  timeout  client 5m
  use_backend  cloud_identity_api_v2_production if v2_acl_admin
  use_backend  cloud_identity_api_v2_production if v2_acl_main

frontend cloud_identity_api_x509_production
  bind 188.184.148.158:8443 ssl no-sslv3 crt /etc/haproxy/cert.pem ca-file /etc/haproxy/ca.pem verify required
  default_backend  cloud_identity_api_v3_production
  rspadd  Strict-Transport-Security:\ max-age=15768000
  timeout  http-request 5m
  timeout  client 5m
  use_backend  cloud_identity_api_v3_production if { ssl_fc_has_crt }

backend cloud_identity_api_v2_production
  balance  roundrobin
  stick  on src
  stick-table  type ip size 20k peers cloud_identity_frontend_production
  timeout  server 5m
  timeout  queue 5m
  timeout  connect 5m
  server cci-keystone-bck01 128.142.132.22:443 check ssl verify none
  server cci-keystone-bck02 188.184.149.124:443 check ssl verify none
  server p01001453s11625 128.142.174.37:443 check ssl verify none

backend cloud_identity_api_v3_production
  balance  roundrobin
  http-request  set-header X-SSL-Client-CN %{+Q}[ssl_c_s_dn(cn)]
  stick  on src
  stick-table  type ip size 20k peers cloud_identity_frontend_production
  timeout  server 5m
  timeout  queue 5m
  timeout  connect 5m
  server cci-keystone-bck03 128.142.159.38:443 check ssl verify none
  server cci-keystone-bck04 128.142.164.244:443 check ssl verify none
  server cci-keystone-bck05 128.142.132.192:443 check ssl verify none
  server cci-keystone-bck06 128.142.146.182:443 check ssl verify none

listen stats
  bind 188.184.148.158:8080
  stats  uri /
  stats  auth haproxy:toto1TOTO$

peers cloud_identity_frontend_production
  peer cci-keystone-load01.cern.ch 188.184.148.158:7777
  peer cci-keystone-load02.cern.ch 128.142.153.203:7777
  peer p01001464675431.cern.ch 128.142.190.8:7777
Apache configuration
WSGISocketPrefix /var/run/wsgi

Listen 443

<VirtualHost *:443>
  ServerName keystone.cern.ch
  DocumentRoot /var/www/cgi-bin/keystone
  LimitRequestFieldSize 65535

  SSLEngine On
  SSLCertificateFile      /etc/keystone/ssl/certs/hostcert.pem
  SSLCertificateKeyFile   /etc/keystone/ssl/keys/hostkey.pem
  SSLCertificateChainFile /etc/keystone/ssl/certs/ca.pem
  SSLCACertificateFile    /etc/keystone/ssl/certs/ca.pem
  SSLVerifyClient         none
  SSLOptions              +StdEnvVars
  SSLVerifyDepth          10
  SSLUserName             SSL_CLIENT_S_DN_CN
  SSLProtocol             all -SSLv2 -SSLv3

  SSLCipherSuite          ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128:AES256:AES:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK
  SSLHonorCipherOrder     on
  Header add Strict-Transport-Security "max-age=15768000"


  WSGIDaemonProcess keystone user=keystone group=keystone processes=2 threads=2
  WSGIProcessGroup keystone

  WSGIScriptAlias /admin /var/www/cgi-bin/keystone/admin
  <Location "/admin">
    SSLRequireSSL
    SSLVerifyClient       none
  </Location>

  WSGIScriptAlias /main /var/www/cgi-bin/keystone/main
  <Location "/main">
    SSLRequireSSL
    SSLVerifyClient       none
  </Location>

  WSGIScriptAlias /krb /var/www/cgi-bin/keystone/main

  <Location "/krb">
    SSLRequireSSL
    SSLVerifyClient       none
  </Location>

  <Location "/krb/v3/auth/tokens">
    SSLRequireSSL
    SSLVerifyClient       none
    AuthType              Kerberos
    AuthName              "Kerberos Login"
    KrbMethodNegotiate    On
    KrbMethodK5Passwd     Off
    KrbServiceName        Any
    KrbAuthRealms         CERN.CH
    Krb5KeyTab            /etc/httpd/http.keytab
    KrbVerifyKDC          Off
    KrbLocalUserMapping   On
    KrbAuthoritative      On
    Require valid-user
  </Location>

  WSGIScriptAlias /x509 /var/www/cgi-bin/keystone/main

  <Location "/x509">
    Order allow,deny
    Allow from all
  </Location>

  WSGIScriptAliasMatch ^(/main/v3/OS-FEDERATION/identity_providers/.*?/protocols/.*?/auth)$ /var/www/cgi-bin/keystone/main/$1

  <LocationMatch /main/v3/OS-FEDERATION/identity_providers/.*?/protocols/saml2/auth>
    ShibRequestSetting requireSession 1
    AuthType shibboleth
    ShibRequireSession On
    ShibRequireAll On
    ShibExportAssertion Off
    Require valid-user
  </LocationMatch>

  <LocationMatch /main/v3/OS-FEDERATION/websso>
    ShibRequestSetting requireSession 1
    AuthType shibboleth
    ShibRequireSession On
    ShibRequireAll On
    ShibExportAssertion Off
    Require valid-user
  </LocationMatch>

  <Location /Shibboleth.sso>
    SetHandler shib
  </Location>

  <Directory /var/www/cgi-bin/keystone>
    Options FollowSymLinks
    AllowOverride All
    Order allow,deny
    Allow from all
  </Directory>
</VirtualHost>

References

The code of python-openstackclient as long as the python-keystoneclient that we are using for this implementation is available at:


We will be working with the community in the Paris summit to find the best way to integrate this functionality into the standard OpenStack release.

Credits

The main author is Jose Castro Leon with help from Marek Denis.

Many thanks to the Keystone core team for their help and advice on the implementation.

by Tim Bell (noreply@blogger.com) at October 21, 2014 01:19 PM

ICCLab

8th Swiss Openstack meetup

docker-logo-loggedout         chosug  icclab-logo

Last week, 16 Oct 2014, great participation to OpenStack User Group - Meeting @ICCLab Winterthur. We have co-located it with docker CH meetupAround 60 participants from both the user groups.

For this event, we have organised the agenda trying to have a good mix of big players and developers presentations. Goals : Analysis of OpenStack  Solutions, deployments and container solutions.

Final Agenda  start: 18.00

Snacks and Drinks were kindly offered by ZHAWMirantis

We had some interesting technical discussions and Q&A with some speakers during the evening apero, as usual.

 

IMG_20141016_204104 IMG_20141016_203550 IMG_20141016_200322 IMG_20141016_192012 IMG_20141016_183141 - Copy IMG_20141016_185018IMG_20141016_181052 - Copy

by Antonio Cimmino at October 21, 2014 12:20 PM

Joshua Hesketh

OpenStack infrastructure swift logs and performance

Turns out I’m not very good at blogging very often. However I thought I would put what I’ve been working on for the last few days here out of interest.

For a while the OpenStack Infrastructure team have wanted to move away from storing logs on disk to something more cloudy – namely, swift. I’ve been working on this on and off for a while and we’re nearly there.

For the last few weeks the openstack-infra/project-config repository has been uploading its CI test logs to swift as well as storing them on disk. This has given us the opportunity to compare the last few weeks of data and see what kind of effects we can expect as we move assets into an object storage.

  • I should add a disclaimer/warning, before you read, that my methods here will likely make statisticians cringe horribly. For the moment though I’m just getting an indication for how things compare.

The set up

Fetching files from an object storage is nothing particularly new or special (CDN’s have been doing it for ages). However, for our usage we want to serve logs with os-loganalyze giving the opportunity to hyperlink to timestamp anchors or filter by log severity.

First though we need to get the logs into swift somehow. This is done by having the job upload its own logs. Rather than using (or writing) a Jenkins publisher we use a bash script to grab the jobs own console log (pulled from the Jenkins web ui) and then upload it to swift using credentials supplied to the job as environment variables (see my zuul-swift contributions).

This does, however, mean part of the logs are missing. For example the fetching and upload processes write to Jenkins’ console log but because it has already been fetched these entries are missing. Therefore this wants to be the very last thing you do in a job. I did see somebody do something similar where they keep the download process running in a fork so that they can fetch the full log but we’ll look at that another time.

When a request comes into logs.openstack.org, a request is handled like so:

  1. apache vhost matches the server
  2. if the request ends in .txt.gz, console.html or console.html.gz rewrite the url to prepend /htmlify/
  3. if the requested filename is a file or folder on disk, serve it up with apache as per normal
  4. otherwise rewrite the requested file to prepend /htmlify/ anyway

os-loganalyze is set up as an WSGIScriptAlias at /htmlify/. This means all files that aren’t on disk are sent to os-loganalyze (or if the file is on disk but matches a file we want to mark up it is also sent to os-loganalyze). os-loganalyze then does the following:

  1. Checks the requested file path is legitimate (or throws a 400 error)
  2. Checks if the file is on disk
  3. Checks if the file is stored in swift
  4. If the file is found markup (such as anchors) are optionally added and the request is served
    1. When serving from swift the file is fetched via the swiftclient by os-loganlayze in chunks and streamed to the user on the fly. Obviously fetching from swift will have larger network consequences.
  5. If no file is found, 404 is returned

If the file exists both on disk and in swift then step #2 can be skipped by passing ?source=swift as a parameter (thus only attempting to serve from swift). In our case the files exist both on disk and in swift since we want to compare the performance so this feature is necessary.

So now that we have the logs uploaded into swift and stored on disk we can get into some more interesting comparisons.

Testing performance process

My first attempt at this was simply to fetch the files from disk and then from swift and compare the results. A crude little python script did this for me: http://paste.openstack.org/show/122630/

The script fetches a copy of the log from disk and then from swift (both through os-loganalyze and therefore marked-up) and times the results. It does this in two scenarios:

  1. Repeatably fetching the same file over again (to get a good average)
  2. Fetching a list of recent logs from gerrit (using the gerrit api) and timing those

I then ran this in two environments.

  1. On my local network the other side of the world to the logserver
  2. On 5 parallel servers in the same DC as the logserver

Running on my home computer likely introduced a lot of errors due to my limited bandwidth, noisy network and large network latency. To help eliminate these errors I also tested it on 5 performance servers in the Rackspace cloud next to the log server itself. In this case I used ansible to orchestrate the test nodes thus running the benchmarks in parallel. I did this since in real world use there will often be many parallel requests at once affecting performance.

The following metrics are measured for both disk and swift:

  1. request sent – time taken to send the http request from my test computer
  2. response – time taken for a response from the server to arrive at the test computer
  3. transfer – time taken to transfer the file
  4. size – filesize of the requested file

The total time can be found by adding the first 3 metrics together.

 

Results

Home computer, sequential requests of one file

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=2145436239&amp;format=interactive" width="100%"></iframe>

The complementary colours are the same metric and the darker line represents swift’s performance (over the lighter disk performance line). The vertical lines over the plots are the error bars while the fetched filesize is the column graph down the bottom. Note that the transfer and file size metrics use the right axis for scale while the rest use the left.

As you would expect the requests for both disk and swift files are more or less comparable. We see a more noticable difference on the responses though with swift being slower. This is because disk is checked first, and if the file isn’t found on disk then a connection is sent to swift to check there. Clearly this is going to be slower.

The transfer times are erratic and varied. We can’t draw much from these, so lets keep analyzing deeper.

The total time from request to transfer can be seen by adding the times together. I didn’t do this as when requesting files of different sizes (in the next scenario) there is nothing worth comparing (as the file sizes are different). Arguably we could compare them anyway as the log sizes for identical jobs are similar but I didn’t think it was interesting.

The file sizes are there for interest sake but as expected they never change in this case.

You might notice that the end of the graph is much noisier. That is because I’ve applied some rudimentary data filtering.

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift
Standard Deviation 54.89516183 43.71917948 56.74750291 194.7547117 849.8545127 838.9172066 7.121600095 7.311125275
Mean 283.9594368 282.5074598 373.7328851 531.8043908 5091.536092 5122.686897 1219.804598 1220.735632

 

I know it’s argued as poor practice to remove outliers using twice the standard deviation, but I did it anyway to see how it would look. I only did one pass at this even though I calculated new standard deviations.

 

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift
Standard Deviation 13.88664039 14.84054789 44.0860569 115.5299781 541.3912899 515.4364601 7.038111654 6.98399691
Mean 274.9291111 276.2813889 364.6289583 503.9393472 5008.439028 5013.627083 1220.013889 1220.888889

 

I then moved the outliers to the end of the results list instead of removing them completely and used the newly calculated standard deviation (ie without the outliers) as the error margin.

Then to get a better indication of what are average times I plotted the histograms of each of these metrics.

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=732438212&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=115390465&amp;format=interactive" width="100%"></iframe>

Here we can see a similar request time.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1644363181&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=434940837&amp;format=interactive" width="100%"></iframe>

Here it is quite clear that swift is slower at actually responding.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1719303791&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1964116949&amp;format=interactive" width="100%"></iframe>

Interestingly both disk and swift sources have a similar total transfer time. This is perhaps an indication of my network limitation in downloading the files.

 

Home computer, sequential requests of recent logs

Next from my home computer I fetched a bunch of files in sequence from recent job runs.

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1688949678&amp;format=interactive" width="100%"></iframe>

 

Again I calculated the standard deviation and average to move the outliers to the end and get smaller error margins.

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift
Standard Deviation 54.89516183 43.71917948 194.7547117 56.74750291 849.8545127 838.9172066 7.121600095 7.311125275
Mean 283.9594368 282.5074598 531.8043908 373.7328851 5091.536092 5122.686897 1219.804598 1220.735632
Second pass without outliers
Standard Deviation 13.88664039 14.84054789 115.5299781 44.0860569 541.3912899 515.4364601 7.038111654 6.98399691
Mean 274.9291111 276.2813889 503.9393472 364.6289583 5008.439028 5013.627083 1220.013889 1220.888889

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=963200514&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1689771820&amp;format=interactive" width="100%"></iframe>

What we are probably seeing here with the large number of slower requests is network congestion in my house. Since the script requests disk, swift, disk, swift, disk.. and so on this evens it out causing a latency in both sources as seen.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=346021785&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=10713262&amp;format=interactive" width="100%"></iframe>

Swift is very much slower here.

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1488676353&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1384537917&amp;format=interactive" width="100%"></iframe>

Although comparable in transfer times. Again this is likely due to my network limitation.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=1494494491&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1xWNXG2RGK6AhAF4QsdHbPtC2TFM9wNB8OYPMvMvgd1U/pubchart?oid=604459439&amp;format=interactive" width="100%"></iframe>

The size histograms don’t really add much here.
 

Rackspace Cloud, parallel requests of same log

Now to reduce latency and other network effects I tested fetching the same log over again in 5 parallel streams. Granted, it may have been interesting to see a machine close to the log server do a bunch of sequential requests for the one file (with little other noise) but I didn’t do it at the time unfortunately. Also we need to keep in mind that others may be access the log server and therefore any request in both my testing and normal use is going to have competing load.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=1688949678&amp;format=interactive" width="100%"></iframe>

I collected a much larger amount of data here making it harder to visualise through all the noise and error margins etc. (Sadly I couldn’t find a way of linking to a larger google spreadsheet graph). The histograms below give a much better picture of what is going on. However out of interest I created a rolling average graph. This graph won’t mean much in reality but hopefully will show which is faster on average (disk or swift).
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=1484304295&amp;format=interactive" width="100%"></iframe>

You can see now that we’re closer to the server that swift is noticeably slower. This is confirmed by the averages:

 

  request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift
Standard Deviation 32.42528982 9.749368282 245.3197219 781.8807534 1082.253253 2737.059103 0 0
Mean 4.87337544 4.05191168 39.51898688 245.0792916 1553.098063 4167.07851 1226 1232
Second pass without outliers
Standard Deviation 1.375875503 0.8390193564 28.38377158 191.4744331 878.6703183 2132.654898 0 0
Mean 3.487575109 3.418433003 7.550682037 96.65978872 1389.405618 3660.501404 1226 1232

 

Even once outliers are removed we’re still seeing a large latency from swift’s response.

The standard deviation in the requests now have gotten very small. We’ve clearly made a difference moving closer to the logserver.

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=963200514&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=1689771820&amp;format=interactive" width="100%"></iframe>

Very nice and close.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=346021785&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=10713262&amp;format=interactive" width="100%"></iframe>

Here we can see that for roughly half the requests the response time was the same for swift as for the disk. It’s the other half of the requests bringing things down.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=1488676353&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/16UtKwF-KaLAh22QpTbglhLYLjE_bwWRc702n8y8XAz4/pubchart?oid=1384537917&amp;format=interactive" width="100%"></iframe>

The transfer for swift is consistently slower.

 

Rackspace Cloud, parallel requests of recent logs

Finally I ran just over a thousand requests in 5 parallel streams from computers near the logserver for recent logs.

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=1688949678&amp;format=interactive" width="100%"></iframe>

Again the graph is too crowded to see what is happening so I took a rolling average.

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=1484304295&amp;format=interactive" width="100%"></iframe>

 

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift
Standard Deviation 0.7227904332 0.8900549012 434.8600827 909.095546 1913.9587 2132.992773 6.341238774 7.659678352
Mean 3.515711867 3.56191383 145.5941102 189.947818 2427.776165 2875.289455 1219.940039 1221.384913
Second pass without outliers
Standard Deviation 0.4798803247 0.4966553679 109.6540634 171.1102999 1348.939342 1440.2851 6.137625464 7.565931993
Mean 3.379718381 3.405770445 70.31323922 86.16522485 2016.900047 2426.312363 1220.318912 1221.881335

 

The averages here are much more reasonable than when we continually tried to request the same file. Perhaps we’re hitting limitations with swifts serving abilities.

 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=963200514&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=1689771820&amp;format=interactive" width="100%"></iframe>

I’m not sure why we have sinc function here. A network expert may be able to tell you more. As far as I know this isn’t important to our analysis other than the fact that both disk and swift match.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=346021785&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=10713262&amp;format=interactive" width="100%"></iframe>

Here we can now see swift keeping a lot closer to disk results than when we only requested the one file in parallel. Swift is still, unsurprisingly, slower overall.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=1488676353&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=1384537917&amp;format=interactive" width="100%"></iframe>

Swift still loses out on transfers but again does a much better job of keeping up.
 

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=1494494491&amp;format=interactive" width="100%"></iframe>

<iframe height="400px" src="https://docs.google.com/spreadsheets/d/1LXoyF-JausOJArkum-WlKpb19kxxVTau9y4Qled7kxc/pubchart?oid=604459439&amp;format=interactive" width="100%"></iframe>

Error sources

I haven’t accounted for any of the following swift intricacies (in terms of caches etc) for:

  • Fetching random objects
  • Fetching the same object over and over
  • Fetching in parallel multiple different objects
  • Fetching the same object in parallel

I also haven’t done anything to account for things like file system caching, network profiling, noisy neighbours etc etc.

os-loganalyze tries to keep authenticated with swift, however

  • This can timeout (causes delays while reconnecting, possibly accounting for some spikes?)
  • This isn’t thread safe (are we hitting those edge cases?)

We could possibly explore getting longer authentication tokens or having os-loganalyze pull from an unauthenticated CDN to add the markup and then serve. I haven’t explored those here though.

os-loganalyze also handles all of the requests not just from my testing but also from anybody looking at OpenStack CI logs. In addition to this it also needs to deflate the gzip stream if required. As such there is potentially a large unknown (to me) load on the log server.

In other words, there are plenty of sources of errors. However I just wanted to get a feel for the general responsiveness compared to fetching from disk. Both sources had noise in their results so it should be expected in the real world when downloading logs that it’ll never be consistent.

Conclusions

As you would expect the request times are pretty much the same for both disk and swift (as mentioned earlier) especially when sitting next to the log server.

The response times vary but looking at the averages and the histograms these are rarely large. Even in the case where requesting the same file over and over in parallel caused responses to go slow these were only in the magnitude of 100ms.

The response time is the important one as it indicates how soon a download will start for the user. The total time to stream the contents of the whole log is seemingly less important if the user is able to start reading the file.

One thing that wasn’t tested was streaming of different file sizes. All of the files were roughly the same size (being logs of the same job). For example, what if the asset was a few gigabytes in size, would swift have any significant differences there? In general swift was slower to stream the file but only by a few hundred milliseconds for a megabyte. It’s hard to say (without further testing) if this would be noticeable on large files where there are many other factors contributing to the variance.

Whether or not these latencies are an issue is relative to how the user is using/consuming the logs. For example, if they are just looking at the logs in their web browser on occasion they probably aren’t going to notice a large difference. However if the logs are being fetched and scraped by a bot then it may see a decrease in performance.

Overall I’ll leave deciding on whether or not these latencies are acceptable as an exercise for the reader.

by Joshua Hesketh at October 21, 2014 11:44 AM

Opensource.com

How OpenStack powers the research at CERN

OpenStack has been in a production environment at CERN for more than a year. One of the people that has been key to implementing the OpenStack infrastructure is Tim Bell. He is responsible for the CERN IT Operating Systems and Infrastructure group which provides a set of services to CERN users from email, web, operating systems, and the Infrastructure-as-a-Service cloud based on OpenStack.

We had a chance to interview Bell in advance of the OpenStack Summit Paris 2014 where he will deliver two talks. The first session is about cloud federation while the second session is about multi-cell OpenStack.

by jhibbets at October 21, 2014 11:00 AM

ICCLab

Numerical Dosimetry in the cloud

What is it all about?

We’re using a bunch of VMs to do numerical dosimetry and are very satisfied with the service and performance we get. Here I try to give some background on our work.
Assume yourself sitting in the dentists chair for an x-ray image of your teeth. How much radiation will miss the x-ray film in your mouth and instead wander through your body? That’s one type of question we try to answer with computer models. Or numeric dosimetry, as we call it.

The interactions between ionizing radiation – e.g. x-rays – and atoms are well known. However, there is a big deal of randomness, so called stochastic behavior. Let’s go back to the dentists chair and follow one single photon (that’s the particle x-rays are composed of). This sounds a bit like ray tracing, but is way more noisy as you’ll see.

The image below shows a voxel phantom (built of Lego bricks made of bone, fat, muscle etc.) during a radiography of the left breast.

torso_view_beam

Tracing a photon

The photon is just about to leave the x-ray tube. We take a known distribution of photon energies, throw dices and pick one energy at random. Then we decide – again by throwing dices – how long the photon will fly until it comes close to an atom. How exactly will it hit the atom? Which of the many processes (e.g. Compton scattering) will take place? How much energy will be lost and in what direction will it leave the atom? The answer – you may have already guessed that – is rolling in the dice. We repeat the process until the photon has lost all it’s energy or leaves our model world.

During its journey the photon has created many secondary particles (e.g. electrons kicked out of an atomic orbit). We follow each of them and their children again. Finally, all particles have come to rest and we know in detail what happened to that single photon and to the matter it crossed. This process takes some 100 micro seconds on an average cloud CPU.

Monte Carlo (MC)

This method of problem solving is called Monte Carlo after the roulette tables. You always apply MC if there are too many parameters to solve a problem in a deterministic way. One well know application is the so called rain drop Pi. By counting the fraction of random points that are within a circle you can approach the number Pi (3.141).

Back to the dentist: Unfortunately, with our single photon we do not see any energy deposit in your thyroid gland (located at the front of your neck) yet. This first photon passed by pure chance without any interaction. So we just start another one, 5’000 a second, 18 Millions per hour etc. until we have enough dose collected in your neck. Only a tiny fraction q of the N initial photons ends up in our target volume and the energy deposit shows fluctuations that typically decrease proportional to 1/sqrt(qN). So we need some 1E9 initial photons to have 1E5 in the target volume and have a relative error smaller than 1 %. This would take 2 CPU days.

MC and the cloud

This type of MC problems is CPU bound and trivial to parallelize, since the photons are independent from each other (remember that in a drop of water there are 1E23 molecules, our 1E9 photons will not disturb that). So with M CPUs my waiting time is just reduced by a factor M. In the above example and with 50 CPUs I have a result after 1 hour instead of 2 days.

This is a quantitative progress on the one hand. But on the other hand and more important for my work is the progress in quality: During one day, I can play with 10 different scenarios, I can concentrate on problem solving and do not waste time unwinding the stack in my head after a week. The cloud helps to improve the quality of our work.

Practical considerations

The code we use is Geant4 (geant4.cern.ch), a free C++ library to propagate particles through matter. Code development is done locally (e.g. Ubuntu in a virtual box) and then uploaded with rsync to the master node.

Our CPUs are distributed over several virtual machines deployed in ICCLab’s OpenStack cloud. From the master we distribute code and collect results via rsync, job deployment and status is done through small bash scripts. The final analysis is then done locally with Matlab.

Code deployment and result collection is done within 30 seconds, which is negligible compared to run times of hours. So even on the job scale our speedup is M.

by Patrik Eschle at October 21, 2014 09:05 AM

Mirantis

Mirantis Raises $100 Million Series B, Challenging Incumbents as the Pure-Play OpenStack Leader

Insight Venture Partners Leads Largest Series B in Open Source Software History

Mirantis, the pure-play OpenStack company, today announced $100 million in Series B funding led by Insight Venture Partners. The financing is the largest Series B open source investment in history, and one of the largest Series B investments in B2B software, validating Mirantis as the breakaway independent pure-play OpenStack vendor. Insight Venture Partners was joined by August Capital, as well as existing investors Intel Capital, WestSummit Capital, Ericsson, and Sapphire Ventures (formerly SAP Ventures). Alex Crisses, managing director at Insight Venture Partners, will join the Mirantis board of directors.

“OpenStack adoption is accelerating worldwide, driven by the need for low cost, scalable cloud infrastructure. 451 Research estimates a market size of $3.3 billion by 2018,” said Alex Crisses. “Mirantis delivers on OpenStack’s promise of cloud computing at a fraction of the time and cost of traditional IT vendors, and without the compromise of vendor lock-in. Their customer traction has been phenomenal.”

“Mirantis is already leading the OpenStack ecosystem. We are committed to helping it become the principal cloud vendor,” said Vivek Mehra, general partner at August Capital, whose partners have helped grow Splunk, Microsoft, Sun Microsystems, Seagate, Skype, and Tegile Systems into dominant technology players. “Its unique pure-play approach will trump the lock-in of traditional IT vendors.”

Mirantis has helped more than 130 customers implement OpenStack – more than any other vendor –  including Comcast, DirectTV, Ericsson, Expedia, NASA, NTT Docomo, PayPal, Symantec, Samsung, WebEx and Workday.  Among these is the largest OpenStack deal on record: a five-year software licensing agreement with Ericsson. Mirantis is also the largest provider of OpenStack products and services for the telecommunications industry, serving Huawei, NTT Docomo, Orange, Pacnet, Tata Communications, and others.

“Our mission is to move companies from an expensive, lock-in infrastructure to an open cloud that empowers developers and end-users at a fraction of the cost. Customers are seeing the value; we’ve gone from signing about $1 million in new business every month to $1 million every week,” said Mirantis President and CEO, Adrian Ionel. “People choose us because we have the best software and expertise for OpenStack, foster an open partner ecosystem, and are a major upstream contributor, influencing the technology’s direction.”

“Mirantis OpenStack is the only truly hardened and commercially-supported OpenStack distribution today that you can just download from the website, install using an intuitive GUI driven process and be up and running in no time,” said Nicholas Summers, Cloud Architect at Home Depot, a Mirantis customer. “With everyone else, you either get raw upstream code or need to engage in an elaborate sales discussion before even getting your hands on the commercial version.”

Mirantis will use the funds to double its engineering investments. It will focus on development of its zero lock-in OpenStack software, including its downloadable distribution, Mirantis OpenStack, and its on-demand, hosted option, Mirantis OpenStack Express. Mirantis is currently the No. 3 contributor to OpenStack and will continue contributing to the community, with particular focus on enterprise-grade reliability and ease-of-use. The funds will also be used to accelerate international expansion in Europe and Asia-Pacific, deepen its bench of support engineers, and grow its open partner ecosystem.

“Driving the accessibility of software defined infrastructure and cloud computing to data centers around the world is an imperative for Intel,” said Jason Waxman, vice president of Intel’s Data Center Group and general manager of Intel’s Cloud Platforms Group. “Mirantis plays a key role in the OpenStack movement, and our investment is designed to accelerate industry adoption of cost-effective workload orchestration solutions.”

About Mirantis
Mirantis is the world’s leading OpenStack company. Mirantis delivers all the software, services, training and support needed for running OpenStack. More customers rely on Mirantis than any other company to get to production deployment of OpenStack at scale. Among the top three companies worldwide in contributing open source software to OpenStack, Mirantis has helped build and deploy some of the largest OpenStack clouds at companies such as Cisco, Comcast, DirectTV, Ericsson, Expedia, NASA, NTT Docomo, PayPal, Symantec, Samsung, WebEx and Workday.

Mirantis is venture-backed by Insight Venture Partners, August Capital, Ericsson, Red Hat, Intel Capital, Sapphire Ventures and WestSummit Capital, with headquarters in Mountain View, California. For more information, visit www.mirantis.com or follow us on Twitter at @mirantisit.

Contact Information:
Sarah Bennett
PR Manager, Mirantis
sbennett@mirantis.com

The post Mirantis Raises $100 Million Series B, Challenging Incumbents as the Pure-Play OpenStack Leader appeared first on Mirantis | The #1 Pure Play OpenStack Company.

by Sarah Bennett at October 21, 2014 07:01 AM

October 20, 2014

Florent Flament

Splitting Swift cluster

At Cloudwatt, we have been operating a near hundred nodes Swift cluster in a unique datacenter for a few years. The decision to split the cluster on two datacenters has been taken recently. The goal is to have at least one replica of each object on each site in order to avoid data loss in case of the destruction of a full datacenter (fire, plane crash, ...).

Constraints when updating a running cluster

Some precautions have to be taken when updating a running cluster with customers' data. We want to ensure that no data is lost or corrupted during the operation and that the cluster's performance isn't hurt too badly.

In order to ensure that no data is lost, we have to follow some guidelines including:

  • Never move more that 1 replica of any object at any given step; That way we ensure that 2 copies out 3 are left intact in case something goes wrong.
  • Process by small steps to limit the impact in case of failure.
  • Check during each step that there is no unusual data corruptions, and that corrupted data are correctly handled and fixed.
  • Check after each step that data has been moved (or kept) at the correct place.
  • If any issue were to happen, rollback to previous step.

To limit the impact on cluster's performance, we have to address to following issues:

  • Assess the availability of cluster resources (network bandwidth, storage nodes' disks & CPU availability) at different times of day and week. This would allow to choose the best time to perform our steps.
  • Assess the load on the cluster of the steps planned to split the cluster.
  • Choose steps small enough so that:
    • it fits time frames where cluster's resources are more available;
    • the load incurred by the cluster (and its users) is acceptable.

A number of these requirements have been addressed by Swift for a while:

  • When updating Swift ring files, the swift-ring-builder tool doesn't move more than 1 replica during reassignment of cluster's partitions (unless something really went wrong). By performing only one reassignment per process step, we ensure that we don't move more than 1 replica at each step.
  • Checking for data corruption is made easy by Swift. 3 processes (swift-object-auditor, swift-container-auditor and swift-account-auditor) running on storage nodes are continuously checking and fixing data integrity.
  • Checking that data is at the correct location is also made easy by the swift-dispersion-report provided.
  • Updating the location of data is made seamless by updating and copying the Ring files to every Swift nodes. Once updated, the Ring files are loaded by Swift processes without the need of being restarted. Rollbacking data location is easily performed by replacing the new Ring files by previous ones.

However, being able to control the amount of data to move to a new datacenter at a given step is a brand new feature, that has been fixed in version 2.2.0 of Swift, released on October 4th of 2014.

Checking data integrity

Swift auditor processes (swift-object-auditor, swift-container-auditor and swift-account-auditor) running on storage nodes are continuously checking data integrity, by checking files' checksums. When a corrupted file is found, it is quarantined; the data is removed from the node and the replication mechanism takes care of replacing the missing data. Below is an example of what concretely happens when manually corrupting an object.

Let's corrupt data by hand:

root@swnode0:/srv/node/d1/objects/154808/c3a# cat 972e359caf9df6fdd3b8e295afd4cc3a/1410353767.57579.data
blabla
root@swnode0:/srv/node/d1/objects/154808/c3a# echo blablb > 972e359caf9df6fdd3b8e295afd4cc3a/1410353767.57579.data

The corrupted object is 'quarantined' by the object-auditor when it checks the files integrity. Here's how it appears in the /var/log/syslog log file:

Sep 10 13:56:44 swnode0 object-auditor: Quarantined object /srv/node/d1/objects/154808/c3a/972e359caf9df6fdd3b8e295afd4cc3a/1410353767.57579.data: ETag 9b36b2e89df94bc458d629499d38cf86 and file's md5 6235440677e53f66877f0c1fec6a89bd do not match
Sep 10 13:56:44 swnode0 object-auditor: ERROR Object /srv/node/d1/objects/154808/c3a/972e359caf9df6fdd3b8e295afd4cc3a failed audit and was quarantined: ETag 9b36b2e89df94bc458d629499d38cf86 and file's md5 6235440677e53f66877f0c1fec6a89bd do not match
Sep 10 13:56:44 swnode0 object-auditor: Object audit (ALL) "forever" mode completed: 0.02s. Total quarantined: 1, Total errors: 0, Total files/sec: 46.71, Total bytes/sec: 326.94, Auditing time: 0.02, Rate: 0.98

The quarantined object is then overwritten by the object-replicator of a node that has the appropriate replica uncorrupted. Below is an extract of the log file on such node:

Sep 10 13:57:01 swnode1 object-replicator: Starting object replication pass.
Sep 10 13:57:01 swnode1 object-replicator: <f+++++++++ c3a/972e359caf9df6fdd3b8e295afd4cc3a/1410353767.57579.data
Sep 10 13:57:01 swnode1 object-replicator: Successful rsync of /srv/node/d1/objects/154808/c3a at 192.168.100.10::object/d1/objects/154808 (0.182)
Sep 10 13:57:01 swnode1 object-replicator: 1/1 (100.00%) partitions replicated in 0.21s (4.84/sec, 0s remaining)
Sep 10 13:57:01 swnode1 object-replicator: 1 suffixes checked - 0.00% hashed, 100.00% synced
Sep 10 13:57:01 swnode1 object-replicator: Partition times: max 0.2050s, min 0.2050s, med 0.2050s
Sep 10 13:57:01 swnode1 object-replicator: Object replication complete. (0.00 minutes)

The corrupted data has been replaced by the correct data on the initial storage node (where the file had been corrupted):

root@swnode0:/srv/node/d1/objects/154808/c3a# cat 972e359caf9df6fdd3b8e295afd4cc3a/1410353767.57579.data
blabla

Checking data location

Preparation

We can use the swift-dispersion-report tool provided with Swift to monitor our data dispersion ratio (ratio of objects on the proper device / number of objects). A dedicated Openstack account is required that will be used by swift-dispersion-populate to create containers and objects.

Then we have to configure appropriately the swift-dispersion-report tool with the /etc/swift/dispersion.conf file:

[dispersion]
auth_url = http://SWIFT_PROXY_URL/auth/v1.0
auth_user = DEDICATED_ACCOUNT_USERNAME
auth_key = DEDICATED_ACCOUNT_PASSWORD

Once properly set, we can initiate dispersion monitoring by populating our new account with test data:

cloud@swproxy:~$ swift-dispersion-populate
Created 2621 containers for dispersion reporting, 4m, 0 retries
Created 2621 objects for dispersion reporting, 2m, 0 retries

Our objects should have been placed on appropriate devices. We can check this:

cloud@swproxy:~$ swift-dispersion-report
Queried 2622 containers for dispersion reporting, 2m, 31 retries
100.00% of container copies found (7866 of 7866)
Sample represents 1.00% of the container partition space
Queried 2621 objects for dispersion reporting, 45s, 1 retries
There were 2621 partitions missing 0 copy.
100.00% of object copies found (7863 of 7863)
Sample represents 1.00% of the object partition space

Monitoring data redistribution

Once updated ring has been pushed to every nodes and proxy servers, we can follow the data redistribution with the swift-dispersion-report. The migration is terminated when the number of objects copies reach 100%. Here's an example of results obtained on a 6 nodes cluster.

cloud@swproxy:~$ swift-dispersion-report
Queried 2622 containers for dispersion reporting, 3m, 29 retries
100.00% of container copies found (7866 of 7866)
Sample represents 1.00% of the container partition space
Queried 2621 objects for dispersion reporting, 33s, 0 retries
There were 23 partitions missing 0 copy.
There were 2598 partitions missing 1 copy.
66.96% of object copies found (5265 of 7863)
Sample represents 1.00% of the object partition space

# Then some minutes later
cloud@swproxy:~$ swift-dispersion-report
Queried 2622 containers for dispersion reporting, 5m, 0 retries
100.00% of container copies found (7866 of 7866)
Sample represents 1.00% of the container partition space
Queried 2621 objects for dispersion reporting, 26s, 0 retries
There were 91 partitions missing 0 copy.
There were 2530 partitions missing 1 copy.
67.82% of object copies found (5333 of 7863)
Sample represents 1.00% of the object partition space

Limiting the amount of data to move

There has been a number of recent contributions to Swift that have been done in order to allow the smooth addition of nodes to a new region.

With versions of swift-ring-builder earlier than Swift 2.1, when adding a node to a new region, 1 replica of every object was moved to the new region in order to maximize the dispersion of objects across different regions. Such algorithm had severe drawbacks. Let's consider a one region Swift cluster with 100 storage nodes. Adding 1 node to a second region had the effect of transferring 1/3 of the cluster's data to the new node, which would not have the capacity to store the data previously distributed over 33 nodes. So in order to add a new region to our cluster, we had to add in 1 step enough nodes to store 1/3 of our data. Let's consider we add 33 nodes to the new region. While there is enough capacity on these nodes to receive 1 replica of every objects, such operation would trigger the transfer of Petabytes of data to the new nodes. With a 10 Gigabits/second link between the 2 datacenters, such transfer would take days if not weeks, during which the cluster's network and destination nodes' disks would be saturated.

With commit 6d77c37 ("Let admins add a region without melting their cluster"), that has been released with Swift 2.1, the number of partitions assigned to nodes in a new region was determined by the weights of the nodes' devices. This feature allowed a Swift cluster operator the limit the amount of data transferred to a new region. However, because of bug 1367826 ("swift-ringbuilder rebalance moves 100% partitions when adding a new node to a new region"), even when limiting the amount of data transferred to a new region, a big amount of data is moved uselessly inside the initial region. For instance, it could happen that after a swift-ring-builder rebalance operation, 3% of partitions were assigned to the new region, but 88% of partitions were reassigned to new nodes inside the first region. The would lead to uselessly loading the cluster's network and storage nodes.

Eventually, commit 20e9ad5 ("Limit partition movement when adding a new tier") fixed bug 1367826. This commit has been released with Swift 2.2. It allows an operator to choose the amount of data that flows between regions, when adding nodes to a new region, without border effects. This feature enables the operator to perform a multi steps cluster split, by first adding devices with very low weights to a new region, then by progressively increasing the weights step by step. This can be done until 1 replica of every objects has been transferred to the new region. Since the number of partitions assigned to the new region depends on the weights assigned to the new devices, the operator has to compute the appropriate weights.

Computing new region weight for a given ratio of partitions

In order to assign a given ratio of partitions to a new region, a Swift operator can compute the devices' weights by using the following formula.

Given:

  • w1 is the weight of a single device in region r1
  • r1 has n1 devices
  • W1 = n1 * w1 is the full weight of region r1
  • r2 has n2 devices
  • w2 is the weight of a single device in region r2
  • W2 = n2 * w2 is the full weight of region r2
  • r is the ratio of partitions we want in region r2

We have:

  • r = W2 / (W1+W2)
  • <=> W2 = r*W1 / (1-r)
  • <=> w2 = rW1 / (1-r)n2

w2 is the weight to set to each device of region r2

Computing new devices weight for a given number of partitions

In some cases the operator may prefer to specify the number of partitions (rather than its ratio) that he wishes to assign to the devices of a new region.

Given:

  • p1 the number of partitions in region r1
  • W1 the full weight of region r1
  • p2 the number of partitions in region r2
  • W2 the full weight of region r2

We have the following equality:

  • p1/W1 = p2/W2
  • <=> W2 = W1*p2 / p1
  • <=> w2 = W1p2 / n2p1

w2 is the weight to set to each device of region r2

Some scripts to compute weights automatically

I made some Swift scripts available to facilitate adding nodes to a new region. swift-add-nodes.py allows adding nodes to a new region with a minimal weight so that only 1 partition will be assigned to each device (The number and names of devices is set in a constant at the beginning of the script and has to be updated). Then swift-assign-partitions.py allows assigning a chosen ratio of partitions to the new region.

Example of deployment

Here's an example of the steps that a Swift operator can follow in order to split its one region cluster into 2 regions smoothly. A first step may consist in adding some new nodes to the new region and assigning 1 partition to each device. This would typically move between hundreds of Megabytes to a few Gigabytes; thus allowing to check that everything (network, hardware, ...) is working as expected. We can use the swift-add-nodes.py script to easily add nodes to our new region with a minimal weight so that only 1 partition will be assigned to each device:

$ python swift-add-nodes.py object.builder object.builder.s1 2 6000 127.0.0.1 127.0.0.2 127.0.0.3
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.1', 'region': 2, 'device': 'sdb1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.1', 'region': 2, 'device': 'sdc1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.1', 'region': 2, 'device': 'sdd1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.1', 'region': 2, 'device': 'sde1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.2', 'region': 2, 'device': 'sdb1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.2', 'region': 2, 'device': 'sdc1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.2', 'region': 2, 'device': 'sdd1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.2', 'region': 2, 'device': 'sde1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.3', 'region': 2, 'device': 'sdb1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.3', 'region': 2, 'device': 'sdc1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.3', 'region': 2, 'device': 'sdd1', 'port': 6000}
Adding device: {'weight': 5.11, 'zone': 0, 'ip': '127.0.0.3', 'region': 2, 'device': 'sde1', 'port': 6000}

$ swift-ring-builder object.builder.s1 rebalance
Reassigned 12 (0.00%) partitions. Balance is now 0.18.

Subsequent steps may consist in increasing the partitions count by steps of some percentage (let's say 3%) until one third of total cluster data is stored in the new region. Script swift-assign-partitions.py allows assigning a chosen ratio of partitions to the new region:

$ python swift-assign-partitions.py object.builder.s2 object.builder.s3 2 0.03
Setting new weight of 10376.28 to device 1342
Setting new weight of 10376.28 to device 1343
Setting new weight of 10376.28 to device 1344
Setting new weight of 10376.28 to device 1345
Setting new weight of 10376.28 to device 1346
Setting new weight of 10376.28 to device 1347
Setting new weight of 10376.28 to device 1348
Setting new weight of 10376.28 to device 1349
Setting new weight of 10376.28 to device 1350
Setting new weight of 10376.28 to device 1351
Setting new weight of 10376.28 to device 1352
Setting new weight of 10376.28 to device 1353

$ swift-ring-builder object.builder.s3 rebalance
Reassigned 25119 (9.58%) partitions. Balance is now 0.25.

Thanks & related links

Special thanks to Christian Schwede for the awesome work he did to improve the swift-ring-builder.

Interested in more details about how Openstack Swift Ring is working ?

Want to know more about all of this ? Come to see our talk Using OpenStack Swift for Extreme Data Durability at the next OpenStack Summit in Paris !

by Florent Flament at October 20, 2014 09:25 PM

OpenStack Blog

OpenStack Workshop At Grace Hopper Open Source Day 2014

This year, OpenStack participated in Open Source Day (OSD) at the Grace Hopper Celebration of Women in Computing (GHC) for the second time. The main focus of this year’s Open Source Day was humanitarian applications. Along with OpenStack, participating open source projects included Microsoft Disaster Recovery, Ushahidi, Sahana Software Foundation and others.

As important as it is to build humanitarian applications, it is equally important that they are up and running in times of need and disaster. Hence, the OpenStack code-a-thon focused on building fault tolerant and scalable architectures using servers, databases and load balancers.

The six-hour code-a-thon started at 12.30 p.m. October 8. OpenStack had more than 55 participants ranging from college and university students, to professors and teachers, to professionals from various software companies. The day kicked off with a presentation by Egle Sigler on the basics of cloud computing and OpenStack, and what factors one must keep in mind when designing a fault tolerant architecture.

We divided the participants into smaller groups of five to six and each group had a dedicated group volunteer. We had two activities planned out for the OSD. During the first activity, the participants wrote a Python script to deploy two web servers with a HA Proxy server and a database server. The second activity involved deploying a demo Ushahidi application on cloud servers using Heat templates. Along with completing these activities, the participants were encouraged to architect and deploy their own solutions on the public cloud.

We had some pre-written base code in the GitHub repository to help the participants get started. We used OpenStack-powered Rackspace Cloud Servers for deployments. Some of the participants were more adventurous and even wrote code to backup their information using Swift/Cloud Files.

The participants were from different skill levels. For some of them it was their first time getting accustomed to the command line and using git; whereas for some it was their first time trying out OpenStack. Everyone who attended the code-a-thon got to learn something new!

At the end of the day, one of the participants, Yanwei Zhang, demoed how after decommissioning one of the two Apache servers the application still could be accessed using the load balancer IP.

We received some great feedback from the participants. Here are some of the responses we received in anonymous survey:

Got to learn about OpenStack deployment and meet some great women.

It was fun learning something new.

I liked the participation of the volunteers, their experience was great to hear!”

The Open Source Day would not have been possible without the help of the amazing volunteers who inspired the participants to keep hacking and learning. One of the participants mentioned: “The helpers were awesome, very positive, and obviously very enthusiastic about their work. Good job.” Overall, we had 14 volunteers, a mix of Rackers and graduates from the GNOME OPW program: Victoria Martínez de la Cruz, Jenny Vo, Sabeen Syed, Anne Gentle, Dragana Perez, Riddhi Shah, Zaina Afoulki, Lisa Clark, Zabby Damania, Cindy Pallares-Quezada, Besan Abu Radwan, Veera Venigalla, Carla Crull and Benita Daniel.

This is a post written and contributed by Egle Sigler and Iccha Sethi.

Egle Sigler is a Principal Architect on a Private Cloud Solutions team at Rackspace. In addition to working with OpenStack and related technologies, Egle is a governing board member for POWER (Professional Organization of Women Empowered at Rackspace), Rackspace’s internal employee resource group dedicated to empowering women in technology. Egle holds a M.S. degree in Computer Science.

Iccha Sethi is a long time contributor to OpenStack and has worked on the Cloud Images (Glance) and Cloud Databases (Trove) OpenStack products at Rackspace. She has been involved in several community initiatives including being a mentor for GNOME OPW program and is the founder of Let’s Code Blacksburg!

by Iccha Sethi at October 20, 2014 08:17 PM

Percona

Autumn: A season of MySQL-related conferences. Here’s my list

Autumn is a season of MySQL-related conferences and I’m about to hit the road to speak and attend quite a  few of them.

Peter Zaitsev prepares for a tour of worldwide MySQL-related conferences including Percona Live London, All Things Open, Highload++, AWS re:Invent - Percona will also be at OpenStack Paris.This week I’ll participate in All Things Open, a local conference for me here in Raleigh, N.C. and therefore one I do not have to travel for. All Things Open explores open source, open tech and the open web in the enterprise. I’ll be speaking on SSDs for Databases at 3:15 p.m. on Thursday, Oct. 23 and I’ll also be participating in a book signing for the High Performance MySQL Book at 11:45 p.m. at the “Meet the Speaker” table. We are also proud to be sponsor of this show so please stop by and say “Hi” at our booth in the expo hall.

Following this show I go to Moscow, Russia to the Highload++ conference. This is wonderful show for people interested in high-performance solutions for Internet applications and I attend almost every year. It has a great lineup of speakers from leading Russian companies as well as many top International speakers covering a lot of diverse technologies. I have 3 talks at this show around Application Architecture, Using Indexes in MySQL and about SSD and Flash Storage for Databases. I’m looking forward to reconnecting with my many Russian friends at this show.

From Highload I go directly to Percona Live London 2014 (Nov. 3-4) which is the show we’re putting together – which of course means it is filled with great in-depth information about MySQL and its variants. I think this year we have a good balance of talks from MySQL users such as Facebook, Github, Booking.com, Ebay, Spil Games, IE Domain registry as well as vendors with in-depth information about products and having experiences with many customer environments – MySQL @ Oracle, HP, HGST, Percona, MariaDB, Pythian, Codership, Continuent, Tokutek, FromDual, OlinData. It looks like it is going to be a great show (though of course I’m biased) so do not forget to get registered if you have not already. (On Twitter use hashtag #PerconaLive)

The show I’m sorry to miss is the OpenStack Paris Summit. Even though it is so close to London, the additional visa logistics make it unfeasible for me to visit. There is going to be a fair amount of Perconians on the show, though. Our guys will be speaking about a MySQL and OpenStack Deep Dive as well as Percona Server Features for OpenStack and Trove Ops. We’re also exhibiting on this show so please stop by our booth and say “hi.”

Finally there is AWS re:Invent in Las Vegas Nov. 11-14. I have not submitted any talks for this one but I’ll drop in for a day to check it out. We’re also exhibiting at this show so if you’re around please stop by and stay “hi.”

This is going to be quite a busy month with a lot of events! There are actually more where we’re speaking or attending. If you’re interested about events we’re participating, there is a page on our web site to tell you just that! I also invite you to submit papers to speak at the new OpenStack Live 2015 conference April 13-14, which runs parallel to the annual Percona Live MySQL Conference and Expo 2015 April 13-16 – both at the Hyatt Regency Santa Clara & The Santa Clara Convention Center in Silicon Valley.

The post Autumn: A season of MySQL-related conferences. Here’s my list appeared first on MySQL Performance Blog.

by Peter Zaitsev at October 20, 2014 02:58 PM

Opensource.com

OpenStack Juno is here, preparing for Summit, and more

Interested in keeping track of what's happening in the open source cloud? Opensource.com is your source for what's happening right now in OpenStack, the open source cloud infrastructure project.

by Jason Baker at October 20, 2014 07:00 AM

October 17, 2014

OpenStack Blog

OpenStack Community Weekly Newsletter (Oct 10 – 17)

OpenStack Juno is here!

OpenStack Juno, the tenth release of the open source software for building public, private, and hybrid clouds has 342 new features to support software development, big data analysis and application infrastructure at scale. The OpenStack community continues to attract the best developers and experts in their disciplines with 1,419 individuals employed by more than 133 organizations contributing to the Juno release.

Tweaking DefCore to subdivide OpenStack platform (proposal for review)>

For nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability. At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models). The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20. Comments and suggest welcome!

Forming the OpenStack API Working Group

A new working group about APIs is forming in the OpenStack community. Its purpose is “To propose, discuss, review, and advocate for API guidelines for all OpenStack Programs to follow.” To learn more read the API Working Group wiki page.

End of the Election Cycle – Results of PTL & TC Elections

Lots of confirmations and some new names. Thank you for all who served in the past cycle and welcome to new OpenStack Tech Leads and members of the Technical Committee.

The Road To Paris 2014 – Deadlines and Resources

During the Paris Summit there will be a working session for the Women of OpenStack to frame up more defined goals and line out a blueprint for the group moving forward. We encourage all women in the community to complete this very short survey to provide input for the group.

Report from Events

Relevant Conversations

Tips ‘n Tricks

Security Advisories and Notices

Upcoming Events

Other News

Got Answers?

Ask OpenStack is the go-to destination for OpenStack users. Interesting questions waiting for answers:

Welcome New Reviewers, Developers and Core Reviewers

Dominique Savanna Jenkins
Andrew Boik Marcin Karkocha
Nelly Dmitry Nikishov
dominik dobruchowski Cory Benfield
mfabros Richard Winters
Nikolay Fedotov vinod kumar
Imran Hayder Wayne Warren
Chaitanya Challa Carol Bouchard
Shaunak Kashyap pradeep gondu
Mudassir Latif Vineet Menon
Jiri Suchomel Evan Callicoat
Edmond Kotowski
Julien Anguenot
Boris Bobrov
Rajini Ram
Nikki
Martin Hickey
Lena Novokshonova
Jin Liu
Hao Chen
Albert

OpenStack Reactions

After the new release is announced

The weekly newsletter is a way for the community to learn about all the various activities occurring on a weekly basis. If you would like to add content to a weekly update or have an idea about this newsletter, please leave a comment.

by Stefano Maffulli at October 17, 2014 10:24 PM

IBM OpenStack Team

OpenStack guest CPU topology configuration: Part two

In my previous post, I introduced the guest CPU topology configuration feature developed for the Juno release of OpenStack. As a reminder, the specification for this feature can be read here.

This feature allows administrators and users to specify the CPU topology configured for an OpenStack virtual machine (VM). Initially, this is targeted to libvirt and KVM hypervisors, but presumably could be supported on other OpenStack supported hypervisors over time.

I’ve backported these changes to Icehouse, as that is what our OpenStack PowerKVM continuous integration (CI) testing infrastructure is built around. There is a desire to take advantage of this feature in IBM PowerKVM CI to improve our hardware utilization. These backported changes are available at my github, but please only use at your own risk. As Juno is eminent (October 16, 2014 target date), it’s much preferable to just wait for the official release to try out this feature yourself.

A few examples

Let’s start by attempting to create a VM with four threads. First, let’s create a four-vCPU flavor named “test.vcpu4” to do our experimentation.

[root@p8-dev ~(keystone_admin)]# nova flavor-create test.vcpu4 110 8192 80 4
+-----+------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID  | Name       | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public
+-----+------------+-----------+------+-----------+------+-------+-------------+-----------+
| 110 | test.vcpu4 | 8192      | 80   | 0         |      | 4     | 1.0         | True      |
+-----+------------+-----------+------+-----------+------+-------+-------------+-----------+

Next, let’s specify that we prefer four threads for this new flavor by configuring the ‘hw:cpu_threads’ option on our new test.vcpu4 flavor.

# nova flavor-key test.vcpu4 set hw:cpu_threads=4
# nova flavor-show test.vcpu4
+----------------------------+---------------------------+
| Property                   | Value                     |
+----------------------------+---------------------------+
| name                       | test.vcpu4                |
| ram                        | 8192                      |
| OS-FLV-DISABLED:disabled   | False                     |
| vcpus                      | 4                         |
| extra_specs                | {u'hw:cpu_threads': u'4'} |
| swap                       |                           |
| os-flavor-access:is_public | True                      |
| rxtx_factor                | 1.0                       |
| OS-FLV-EXT-DATA:ephemeral  | 0                         |
| disk                       | 80                        |
| id                         | 102                       |
+----------------------------+---------------------------+

Let’s boot a Fedora 20 Linux image with this new flavor.

# nova boot ¨Cimage jgrimm.f20 --flavor test.vcpu4 jgrimm-test

And let’s verify that the CPU topology is now one socket with one core of four threads.

[fedora@jgrimm-test ~]$ lscpu
Architecture:          ppc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 IBM pSeries (emulated by qemu)
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-3

As a second example, let’s override the flavor behavior and specify that we’d prefer two threads for our test image “jgrimm.f20.”

# nova image-meta jgrimm.f20 set hw_cpu_threads=2
# nova image-show jgrimm.f20
+-----------------------------+--------------------------------------+
| Property                    | Value                                |
+-----------------------------+--------------------------------------+
| status                      | ACTIVE                               |
| metadata extra_args         | console=hvc0 console=tty0            |
| updated                     | 2014-09-08T17:55:05Z                 |
| metadata arch               | ppc64                                |
| name                        | jgrimm.f20                           |
| created                     | 2014-09-08T16:12:18Z                 |
| minDisk                     | 0                                    |
| metadata hw_cpu_threads     | 2                                    |
| metadata hypervisor_type    | kvm                                  |
| progress                    | 100                                  |
| minRam                      | 0                                    |
| OS-EXT-IMG-SIZE:size        | 2350710784                           |
| id                          | bd43c8cb-0766-4a7c-a086-c96bc1c55ac2 |
+-----------------------------+--------------------------------------+

#nova boot --flavor test.vcpu4 --image jgrimm.f20 jgrimm-test2

The resulting “lscpu” output from our new guest is:

[root@jgrimm-test2 ~]# lscpu
Architecture:          ppc64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Big Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          1
Model:                 IBM pSeries (emulated by qemu)
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-3

Notice that in this example, our four-vCPU request has been satisfied by a topology with two sockets, each with one core of two threads. Why did two sockets get chosen over a configuration with two cores? What is happening is that Nova will choose to prioritize configuring sockets over cores and cores over threads.

This preference behavior can be a bit surprising at times. I’ll work through such a situation in my next post, and will provide some concluding thoughts and references.

Please leave a comment below to join the conversation.

The post OpenStack guest CPU topology configuration: Part two appeared first on Thoughts on Cloud.

by Jon Grimm at October 17, 2014 04:08 PM

Opensource.com

OpenStack Summit interview series: the road to Kilo

It's a busy time of the year for OpenStack, with the Juno release just out the door and planning for the upcoming release Kilo already underway. In celebration of new release and the OpenStack Summit in Paris on November 3-7, Opensource.com is featuring a number of interviews with key speakers at the event.

by Jason Baker at October 17, 2014 07:00 AM

October 16, 2014

The Official Rackspace Blog » OpenStack

OpenStack Workshop At Grace Hopper Open Source Day 2014

This year, OpenStack participated in Open Source Day (OSD) at the Grace Hopper Celebration of Women in Computing (GHC) for the second time. The main focus of this year’s Open Source Day was humanitarian applications. Along with OpenStack, participating open source projects included Microsoft Disaster Recovery, Ushahidi, Sahana Software Foundation and others.

As important as it is to build humanitarian applications, it is equally important that they are up and running in times of need and disaster. Hence, the OpenStack code-a-thon focused on building fault tolerant and scalable architectures using servers, databases and load balancers.

The six-hour code-a-thon started at 12.30 p.m. October 8. OpenStack had more than 55 participants ranging from college and university students, to professors and teachers, to professionals from various software companies. The day kicked off with a presentation by Egle Sigler on the basics of cloud computing and OpenStack, and what factors one must keep in mind when designing a fault tolerant architecture.

We divided the participants into smaller groups of five to six and each group had a dedicated group volunteer. We had two activities planned out for the OSD. During the first activity, the participants wrote a Python script to deploy two web servers with a HA Proxy server and a database server. The second activity involved deploying a demo Ushahidi application on cloud servers using Heat templates. Along with completing these activities, the participants were encouraged to architect and deploy their own solutions on the public cloud.

We had some pre-written base code in the GitHub repository to help the participants get started. We used OpenStack-powered Rackspace Cloud Servers for deployments. Some of the participants were more adventurous and even wrote code to backup their information using Swift/Cloud Files.

The participants were from different skill levels. For some of them it was their first time getting accustomed to the command line and using git; whereas for some it was their first time trying out OpenStack. Everyone who attended the code-a-thon got to learn something new!

At the end of the day, one of the participants, Yanwei Zhang, demoed how after decommissioning one of the two Apache servers the application still could be accessed using the load balancer IP.

We received some great feedback from the participants. Here are some of the responses we received in anonymous survey:

Got to learn about OpenStack deployment and meet some great women.

It was fun learning something new.

I liked the participation of the volunteers, their experience was great to hear!”

The Open Source Day would not have been possible without the help of the amazing volunteers who inspired the participants to keep hacking and learning. One of the participants mentioned: “The helpers were awesome, very positive, and obviously very enthusiastic about their work. Good job.” Overall, we had 14 volunteers, a mix of Rackers and graduates from the GNOME OPW program: Victoria Martínez de la Cruz, Jenny Vo, Sabeen Syed, Anne Gentle, Dragana Perez, Riddhi Shah, Zaina Afoulki, Lisa Clark, Zabby Damania, Cindy Pallares-Quezada, Besan Abu Radwan, Veera Venigalla, Carla Crull and Benita Daniel.

We would like to thank the OpenStack Foundation for providing free OpenStack t-shirts for all the participants and volunteers, Rackspace for providing free Cloud Servers to use for the workshop and the GHC OSD committee for working with us to make this possible.

by Egle Sigler and Iccha Sethi at October 16, 2014 09:00 PM

Rob Hirschfeld

Tweaking DefCore to subdivide OpenStack platform (proposal for review)

The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20.  Comments and suggest welcome!

OpenStack in PartsFor nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability.  At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models).

Outside of my role as DefCore chair, I see the OpenStack community asking itself an existential question: “are we one platform or a suite of projects?”  I’m having trouble believing “we are both” is an acceptable answer.

During the post-meeting review, Mark Collier drafted a Foundation supported recommendation that basically creates an additional core tier without changing the fundamental capabilities & designated code concepts.  This proposal has been reviewed by the DefCore committee (but not formally approved in a meeting).

The original DefCore proposed capabilities set becomes the “platform” level while capability subsets are called “components.”  We are considering two initial components, Compute & Object, and both are included in the platform (see illustration below).  The approach leaves the door open for new core component to exist both under and outside of the platform umbrella.

In the proposal, OpenStack vendors who meet either component or platform requirements can qualify for the “OpenStack Powered” logo; however, vendors using the only a component (instead of the full platform) will have more restrictive marks and limitations about how they can use the term OpenStack.

This approach addresses the “is Swift required?” question.  For platform, Swift capabilities will be required; however, vendors will be able to implement the Compute component without Swift and implement the Object component without Nova/Glance/Cinder.

It’s important to note that there is only one yard stick for components or the platform: the capabilities groups and designed code defined by the DefCore process.  From that perspective, OpenStack is one consistent thing.  This change allows vendors to choose sub-components if that serves their business objectives.

It’s up to the community to prove the platform value of all those sub-components working together.


by Rob H at October 16, 2014 07:47 PM

Mirantis

Upgrading OpenStack workloads: introducing Pumphouse

Now that the tenth release of OpenStack has hit the market, it’s natural to see a multitude of deployments based on different releases out there. Here at Mirantis we sometimes face a situation where customers have their own flavor of OpenStack up and running, but want to replace it with Mirantis OpenStack to get the advantages of the latest OpenStack features. At the same time, however, most customers already have workloads in their cloud, and they want to move them to Mirantis OpenStack as is, or with minimal changes so their processes don’t break.

That’s the inspiration for Pumphouse – an open source project that works with the Fuel API to enable us to onboard customer workloads to Mirantis OpenStack with minimal additional hardware needed. The eventual goal for Pumphouse is to replace a customer’s ‘Frankenstack’ with Mirantis OpenStack on the same set of physical servers.

In this series of posts, we’re going to look at why we need Pumphouse, what it does, and how it does it.

What does Pumphouse do?

Pumphouse deals with two clouds: Source and Destination. Its task is to move resources from one cloud to another cloud. Once Source nodes have been cleared, Pumphouse can also automatically reassign them from the Source to the Destination cloud if the Destination cloud doesn’t have enough capacity to accept more workloads.

screenshot-pumphouse

In Pumphouse, a workload is a set of virtual infrastructure resources provided by OpenStack to run a single application or a number of integrated applications working together.

The simplest definition of a workload is a single virtual server with all resources that were used to create it. We consider that a minimal possible workload. However, usually the migration of a single server doesn’t make sense, so the Pumphouse UI doesn’t support migration of individual virtual servers. (It is still possible with the Pumphouse CLI script, however.)

OpenStack groups virtual servers in Nova into projects. Pumphouse can then migrate a project as a whole, with all servers and other types of resources that belong to the project, such as flavors, images and networks.

Pumphouse also replicates credentials used to authenticate and authorize users in projects. Credentials include OpenStack tenants, users, roles, and role assignments.

How does Pumphouse move workloads?

Once we understood what we wanted to move, we had to decide how we wanted to move it.

Users will probably want their migrated resources to appear just as they had in the original cloud. That means Pumphouse must preserve as much of the meta-data of each resource as possible. It can copy parameters between the Source cloud’s database and the Destination cloud’s database. but that would be difficult if the database schema changes, especially when moving to a new release of OpenStack.

Another option is to use the OpenStack APIs as much as possible. APIs are much more stable (in theory, at least) then database models, so we decided that Pumphouse would avoid talking to the Source and Destination cloud databases, and instead leverage API calls to copy meta-data.
Pumphouse server migration

But what about actual data users generate and store in the Source cloud? Pumphouse has to move that, too. User data usually resides in server ephemeral storage and/or in virtual block volumes.

Fortunately, the OpenStack APIs provide ways to transfer that data without using lower-level technologies (such as SSH/dd/etc). For ephemeral storage, there is the Nova ‘snapshot’ feature, which creates a Glance image with the contents of the disk of the virtual server. For virtual block volumes, OpenStack allows us to copy data from the Cinder volume to a Glance image, and then restore the volume in Cinder from the Glance image in the Destination cloud. In both cases, the resulting image can be moved as usual.
Pumphouse image migration

Of course, for some tasks we have to use direct SQL queries to the database. This is especially true for user passwords. Obviously, you can neither query nor set them via the OpenStack API, even as encrypted hashes. Pumphouse requires access to Keystone’s tables in the Source and Destination databases to reset passwords.

What’s behind Pumphouse?

So before we move on to seeing Pumphouse in action, it’s helpful to understand the structure behind it.  Pumphouse is a Python application that provides an HTTP-based API. It relies on the Taskflow library for determined and parallelized execution of atomic tasks. There are tasks to fetch meta-data for different resources from the Source cloud, tasks to upload meta-data to the Destination cloud, and tasks that perform specific operations on resources in clouds, such as suspending the server or creating a snapshot.

Pumphouse normally requires administrative access to both the Source and the Destination OpenStack APIs, and read/write access to the Identity tables in the OpenStack database. Access credentials must be provided via configuration file.

To upgrade hypervisor hosts, Pumphouse requires access to the IPMI network that connects them, and indeed all hosts. IPMI access credentials are also required. They must be provided in the configuration file (inventory.yaml).

Pumphouse must be installed on a system that has network connectivity to all OpenStack API endpoints so it can access both admin and public versions of those APIs. The Pumphouse host also must be able to access the IPMI interfaces of host nodes.

Pumphouse also has CLI scripts that allow you to perform migration tasks in stand-alone mode without running Pumphouse service. This includes pumphouse and pumphouse-bm for operations with hypervisor hosts (evacuation and upgrade).

Pumphouse is being developed in an open Github repository under the Apache 2.0 license. Come, see, fork and hack with us!

What’s next?

In this intro article I defined a problem that we are going to solve in the Pumphouse application. In short, Pumphouse is intended to do the following:

  • move workloads defined in a standard way from one cloud to another, assuming both clouds support the OpenStack APIs, and using those APIs wherever possible;

  • reassign hypervisor hosts from one cloud to another, assuming the destination cloud is a Mirantis OpenStack cloud, using the Fuel API for provisioning and deployment;

  • preserve the meta-data of resources being moved as much as possible.

The first version of Pumphouse covers a very simple use case of workload migration. Follow the next posts in this series to dive deeper into the product internals.

 

The post Upgrading OpenStack workloads: introducing Pumphouse appeared first on Mirantis | The #1 Pure Play OpenStack Company.

by Oleg Gelbukh at October 16, 2014 06:21 PM

IBM OpenStack Team

OpenStack guest CPU topology configuration: Part one

One of the new features of the Juno release of OpenStack is the ability to express a desired guest virtual machine (VM) CPU topology for libvirt managed VMs. In this context, CPU topology refers to the number of sockets, cores per socket and threads per core for a given guest VM.

Up until this point, when you requested an n vCPU VM through OpenStack, the resulting VM would be created with a CPU topology of n sockets, each with one single-thread core. For example, if I create a four-vCPU guest booted to a Fedora 20 Linux image on my POWER8 test system on Icehouse (also the default behavior for Juno), the CPU topology within the created VM would be four sockets, each with one single-threaded core.

[root@bare-precise ~]# lscpu
Architecture:			ppc64
CPU op-mode(s):			32-bit, 64-bit
Byte Order:				Big Endian
CPU(s):					4
On-line CPU(s) list:	0-3
Thread(s) per core:	1
Core(s) per socket:	1
Socket(s):				4
NUMA node(s):			1
Model:						IBM pSeries (emulated by qemu)
L1d cache:				64K
L1i cache:				32K
NUMA node0 CPU(s):		0-3

As you can see, the resulting CPU topology for the VM is four sockets each with one core and each core with one thread.

Why is this an issue?

First, some operating systems (I believe certain versions of Windows) are limited to a maximum number of sockets. So, for example, if your operating system is limited to only running on systems that have two sockets or less, you are limited to using only two vCPU VMs.

If you could instead specify one socket, four cores per socket, and one thread per core, OR one socket, two cores per socket, and two threads per core, you could create VMs able to use more of the physical compute resources on a system.

Second, on systems that have hardware threads, such as simultaneous multithreading (SMT) on my POWER8 test box, these compute resources aren’t being utilized. It would be more desirable to run a four-vCPU guest on a single core (utilizing four hardware threads) and make the most of the computing power of this system.

Given that my POWER8 System has a CPU topology of four sockets, four cores per socket and eight threads per core, a significant amount of compute resources are not being utilized! Additionally, if I’m able to run a VM on a single core, the VM should be less likely to experience penalties related to nonuniform memory access (NUMA) effects, if otherwise the VM would have been assigned cores in different NUMA nodes.

The new virt-driver-vcpu-topology blueprint implemented in the Juno release helps address this situation.

At a high level, this new feature allows you to configure both limits and preferences for sockets, cores and threads at a flavor level as well as at an image level. The extra_specs defined for a flavor are as follows:

hw:cpu_sockets=NN - preferred number of sockets to expose to the guest
hw:cpu_cores=NN - preferred number of cores to expose to the guest
hw:cpu_threads=NN - preferred number of threads to expose to the guest
hw:cpu_max_sockets=NN - maximum number of sockets to expose to the guest
hw:cpu_max_cores=NN - maximum number of cores to expose to the guest
hw:cpu_max_threads=NN - maximum number of threads to expose to the guest

And the extra_specs defined for an image are:

hw_cpu_sockets=NN - preferred number of sockets to expose to the guest
hw_cpu_cores=NN - preferred number of cores to expose to the guest
hw_cpu_threads=NN - preferred number of threads to expose to the guest
hw_cpu_max_sockets=NN - maximum number of sockets to expose to the guest
hw_cpu_max_cores=NN - maximum number of cores to expose to the guest
hw_cpu_max_threads=NN - maximum number of threads to expose to the guest

The way this is envisioned to be used is that an administrator can set default options at the flavor level, and a user or administrator can further refine or limit the topology at the image level. The “preferred” versions of each option must not exceed that of the “maximum” setting.

I hope this introduction has been helpful. In my next post, I’ll walk through some examples of using this new feature as I’ve done testing on my POWER8 PowerKVM system. Please leave a comment below to join the conversation.

The post OpenStack guest CPU topology configuration: Part one appeared first on Thoughts on Cloud.

by Jon Grimm at October 16, 2014 03:49 PM

Florian Haas

Hooray for Juno!

The OpenStack Juno release has landed, right on time as usual, and it's time to say a big thank you to all those wonderful OpenStack developers and contributors across the globe. Thanks for continuing to build something awe-inspiringly wonderful!

If you want to learn about OpenStack Juno, there are multiple opportunities:

read more

by florian at October 16, 2014 03:25 PM

Rafael Knuth

Online Meetup: How is IBM using OpenStack? Part 2

In this meetup we will will present and demonstrate IBM Cloud OpenStack Services, a hosted and...

October 16, 2014 02:56 PM

Cameron Seader

SUSE Cloud 4 OpenStack Admin Appliance; Updated

I hope all who have been using the appliance are enjoying it and finding it useful. I felt it time to update the appliance to include patches to the latest known threats and critical updates to OpenStack.

Latest version 4.0.3

Changes from Github Project
  • Refreshed the Update Repositories to contain latest patches
  • Applied latest Updates to the Appliance
Direct Download links
Please visit the the landing page for the appliance to get more information and documentation here.
Some future things that are coming to look forward to. 
  • Incorporating an Installation media which includes the latest packages from the update repositories for SLES 11 SP3, High Availability Extension, and SUSE Cloud 4 OpenStack. This Installation media will allow me to exclude the full update repositories on the image and therefore reduce the size of the image to just under 2GB. 
  • Moving the build of the image over to openSUSE OBS (Open Build Service) to allow more rapid deployment and testing.
These things will allow for greater portability of the OpenStack software and inherent with it you can install anywhere. Install on VMware. Install on Virtual Box. Install on KVM. Install on Bare Metal. You can truly use this image to deploy and test it out on VMware or KVM, and from the same image you can use it to deploy a full production OpenStack on Bare Metal.  I have even used it to install and test OpenStack out on AWS. So go forth and enjoy installing OpenStack with ease. I challenge you to start using this appliance and see how easy it can be to setup and run OpenStack software. 

by Cameron Seader (noreply@blogger.com) at October 16, 2014 02:41 PM

IBM OpenStack Team

Open, hybrid cloud is about to rain down. Are you ready?

We’re nearing a critical inflection point. The convergence of technologies like cloud, analytics, mobile, and social business is a phenomena that will literally transform industries. Customers and vendors that fail to leverage this unique opportunity to drive business model innovation will be left behind.

Most of the clients I talk to get it. However, they’re struggling to strike the right balance between optimization and innovation:

• On one side are traditional, back-office applications and workloads. We call these “Systems of Record.” If you are a retailer, think of your warehouse inventory records; if you are a healthcare provider, think of your patient records; if you are an insurer, your claims records, etc. In addition, there are traditional IT applications for customer relationship management, and finance and accounting systems, to name a few. These workloads are not going away. In some cases, clients want to move these traditional workloads to a cloud-based environment for greater efficiency, flexibility and access.

Systems of Record - Systems of Engagement

• On the other side, we have newer “Systems of Engagement.” For online retailers, customers can browse what is in their stores, recommend items and complete transactions via their phones; insurers have new mobile apps where customers can file accident claims via their mobile devices; and so on. These new mobile and social apps require an infrastructure where they can be quickly developed, deployed, modified, and redeployed. And then of course, you want to turn the data generated from these applications into insights that can be acted on.

Hybrid cloud is the future

hybrid cloudIBM’s point of view is that the future will be a hybrid world of traditional IT, private and public clouds that enables composable business. This will be a dynamic cloud environment, one that is constantly adapting to meet new requirements. And one that leverages the speed and cost benefits of a public cloud; the security and control of a private cloud; and maximizes return on existing IT investments.

Dynamic hybrid cloud changes the way work gets done across roles:

Business leaders: Choose market leading services (in house and external) to quickly and easily implement business process changes, leverage new ways to engage with customers, and even define entirely new business models.

Developers: Build applications quickly and integrate them on or off premises using a robust set of tools for service composition to assemble composable services and APIs.

IT Leaders: Leverage a self-optimizing dynamic hybrid environment to securely deploy and move applications and data across IT, private and public cloud without the need for manual intervention while delivering the scalability and elasticity required.

Open by design

Dynamic hybrid clouds demand a different approach… one that is not vendor specific or proprietary. They must be based on open technologies, be secure, and be responsive. Supporting and building on open technologies is in our DNA. IBM is a leader in open cloud technologies like OpenStack, Cloud Foundry and Docker which will play an increasingly critical role as the foundational elements of hybrid cloud environments.

Why is this important? An open approach provides you flexibility to deploy your services across a wide range of platforms. You also get to leverage broad communities and ecosystems to accelerate innovation. Time and energy can be spent on invention not reinvention. Implement unique services that deliver value to your customers by building on open cloud technologies that are already available.

OpenStack Juno is here!

OpenStack Juno releaseMomentum for open cloud technology continues to accelerate. Another key milestone was announced today with the availability of the OpenStack Juno release.

(Related: A guide to the OpenStack Juno release)

This latest release continues the drive to deliver enterprise features for security, automation, reliability and availability. More than 100 IBM developers were dedicated to the Juno release, participating in thousands of code reviews, fixing hundreds of bugs and contributing to dozens of new features. Here are a few examples of IBM’s contributions:

  • In support of hybrid cloud environments, IBM contributed new enhancements to enable multiple clouds to work together using industry standard identity federation protocols.
  • IBM expanded audit support across several OpenStack projects by leveraging the DMTF Cloud Audit Data Federation (CADF)
  • IBM contributors led the design for the new V2.1 API, simplifying future changes and ensuring backwards compatibility.
  • IBM was a lead contributor to Refstack, a project focused on assessing and certifying OpenStack interoperability.

The OpenStack ecosystem continues to experience outstanding growth—the Juno release had more than 1,500 contributors! You can learn more about OpenStack’s momentum and IBM’s contributions to Juno in Brad Topol’s blog.

Juno will be a major theme of the OpenStack Summit that will be held in Paris Nov. 3-7. IBM will have a major presence at the summit so I hope to see you there.

(Related: IBM technical sessions at the OpenStack Summit)

Join the party

OpenStack SummitI encourage you to engage and be proactive. Help us shape and accelerate the adoption of dynamic hybrid cloud computing:

These are exciting times full of promise and opportunities. IBM is a trusted partner you can count on to lead the way.

The post Open, hybrid cloud is about to rain down. Are you ready? appeared first on Thoughts on Cloud.

by Angel Luis Diaz at October 16, 2014 01:15 PM

A guide to the OpenStack Juno release

Enterprise readiness and user experience improvements are just a few of the highlights of the latest release of OpenStack, codenamed Juno. As the OpenStack ecosystem celebrates yet another successful milestone, I wanted to honor all of OpenStack’s developers by highlighting some of the best new features that users can expect in this latest release, as well as celebrate some truly amazing contributions by team IBM.

Ecosystem growth

OpenStack Juno releaseThe OpenStack ecosystem continues to experience outstanding growth. In the Juno release, the number of contributors has reached the impressive milestone of more than 2,500 with an average of 400 contributing monthly in the last 12 months. There have been almost 130,000 commits with 61,000 in the last 12 months and an average of 4,000 monthly commits since the beginning of 2014, leaving no doubt that OpenStack is the place to be for IaaS open source development.

Like many of these loyal developers and sponsors, IBM remains committed to the success of OpenStack by increasing contributions focused on improving the OpenStack ecosystem for the benefit of the entire OpenStack community and accelerating its growth. I’m excited to have the opportunity to present an early preview of the key contributions for this latest release.

Increased enterprise security

IBM continues to lead several enterprise integration efforts in OpenStack’s Identity Service Project codenamed Keystone. Key enhancements by the IBM team include federating multiple Keystones, enhanced auditing, and better support for multiple identity backends. Consistent with the Icehouse release, IBM contributors collaborated heavily with contributors from CERN and Rackspace on new enhancements that enable Keystones from multiple clouds to work together using industry standard federation protocols thus laying a solid foundation for enabling OpenStack based hybrid clouds.

For users, this means it will be simpler to combine their on premise OpenStack cloud with the public cloud of their choice. IBM contributors also continued to expand critical CADF Standard-based auditing work to include audit support for users authenticating through a federated environment or when a role assignment is created or deleted.

Furthermore, full support for multiple identity backends was realized in the Juno release. This not only enables Keystone to support different corporate directories but also allows a deployer to specify using an SQL backend for local service accounts, and an LDAP backend for all other users. This results in providing the users a much simpler approach for integrating with their read only LDAP backends.

Block storage improvements

For two consecutive release cycles, IBM has continued to lead in contributions to OpenStack’s Block Storage project, codenamed Cinder. This includes contributions such as the base enablement of volume replication, making it possible for Cinder to create volumes with a replica on the storage backend that can be quickly switched to in the case of a catastrophic storage system failure.

IBM also committed patches to enable internationalization support in Cinder logs, making it possible for OpenStack users to view Cinder logs in languages other than English. Security was enhanced via patches to scrub passwords from logs and to enhance the existing SSH functionality to not ignore unexpected system keys.

Enhanced user experience

In an effort to improve the OpenStack user experience, IBM increased its contributions in the Juno cycle to the OpenStack dashboard project, codenamed Horizon. IBMers contributed more than 1,500 code reviews and also were Horizon’s third highest committer. New key features were added such as custom tooltip support, filtering and pagination support, improved help text, and increased internationalization support. They also continue to work on improving performance and responsiveness by moving Horizon to a client-side rendering model.

New API

For the OpenStack Compute project codenamed Nova, IBM contributors led the development of the design of the new V2.1 API. This work supports the ability to make API changes in the future with much less overhead than the current process including backwards incompatible changes which were not previously possible. We also made numerous bug fixing contributions across Nova and were part of the team monitoring the health of the OpenStack community development system and applied fixes to a variety of bugs affecting OpenStack Continuous Integration as problems arose.

It’s all about interoperability

IBM has also been a strong contributor to Refstack which is the OpenStack community project focused on assessing OpenStack interoperability and provides an API test certification process for OpenStack service and product vendors. IBM was the top contributor to The Refstack-client project which is a test tool that runs compliance tests against a target OpenStack deployment and the second highest contributor to the Refstack project which is a community portal that stores test results, and performs analysis and reporting of compliance outcomes. Refstack ensures that OpenStack based offerings are interoperable which is the true value behind building on open technologies.

Unfortunately it is simply not possible in this article to cover all the innovations that have been added to OpenStack by IBM in the Juno release. Furthermore, there are many other outstanding contributions in this release by active contributors from other companies. Please join us at the next OpenStack Summit in Paris Nov. 3-7 for a much more comprehensive overview of the advances and improvements in the latest version of OpenStack. I look forward to seeing you in Paris!

(Related: IBM technical sessions at the OpenStack Summit)

The post A guide to the OpenStack Juno release appeared first on Thoughts on Cloud.

by Brad Topol at October 16, 2014 12:30 PM

Miguel Ángel Ajo

How to debug SE linux errors, and write custom policies

Sometimes, you find yourself trying to debug a problem with SE linux, specially during software development, or packaging new software features.

I have found this with neutron agents to happen quite often, as new system interactions are developed.

Disabling selinux during development is generally a bad idea, because you’ll discover such problems later in time and under higher pressure (release deadlines).

Here we show a recipe, from Kashyap Chamarthy, to find out what rules are missing, and generate a possible SELinux policy:

$ sudo su -

# make sure selinux is enabled

$ setenforce 1

# Clear your audit log
$ > /var/log/audit/audit.log

# Supposing the problem was in neutron-dhcp-agent, restart it
$ systemctl restart neutron-dhcp-agent

# and wait for the problem to be reproduced..

$ cat /var/log/audit/audit.log

# Show a reference policy

$ cat /var/log/audit/audit.log | audit2allow -R

At that point, report a bug so you get those policies incorporated in advance. Give a good description of what’s blocked by the policies, and why does it need to be unblocked.

Now you can generate a policy, and install it locally:

# Generate an SELinux loadable module package

$ audit2allow -a -M neutron

# Install the Policy Package
$ semodule -i neutron.pp

# Restart neutron-dhcp-agent (or re-trigger the problem to make sure it’s fixed)
$ systemctl restart neutron-dhcp-agent

Make sure to report the issue to the right distribution(s), [RedHat RDORHOS] so this will get fixed properly in advance, and won’t cause trouble at final release stages.

October 16, 2014 08:30 AM

Everett Toews

Forming the OpenStack API Working Group

Have you ever been flummoxed by the inconsistencies and odd design decisions of the OpenStack APIs?

A new working group about APIs is forming in the OpenStack community. Its purpose is "To propose, discuss, review, and advocate for API guidelines for all OpenStack Programs to follow." To learn more read the API Working Group wiki page.

API Working Group Participation

You too can help make the OpenStack APIs better going forward by participating in this group. Many people have already committed to making the APIs better. I invite you to add yourself to the members on the wiki page and watch the openstack-dev mailing list for emails prefixed with [api]. If you're only interested in this topic on the mailing list then you can only subscribe to the API Working Group topic under the Subscription Options.

It's important that this working group have deliverables. Specific things the group can do so that the member's work has a measurable impact on improving the OpenStack APIs.

The deliverables for the API WG are:

  1. Agreed upon and accepted guidelines must be added to version controlled documents.
  2. Review comments on code changes and specs that affect any OpenStack Program API.

The purpose of these deliverables is to not only create guidelines but to act on them as well. As the guidelines are settled upon, the members can use them to review code and spec changes that affect APIs in Gerrit. Commenting on reviews and giving a +1 or -1 to APIs that follow or don't follow the API guidelines will give this working group the potential to affect real change in OpenStack.

There’s an ongoing thread (starting here) on the openstack-dev mailing list about kicking things off. Not every aspect of how the API WG will operate has been settled so feel free to comment. And new conversations around the APIs have already begun. For example, this thread discussing a task based API for long running operations.

Coda

This is only the beginning of the API Working Group. Let's continue the conversation on the openstack-dev mailing list. Let's propose cross project API guideline sessions at the OpenStack Paris Summit. Above all, let's focus on meeting our deliverables.

October 16, 2014 05:00 AM

October 15, 2014

Rob Hirschfeld

OpenStack Goldilocks’ Syndrome: three questions to help us find our bearings

Goldilocks Atlas

Action: Please join Stefano. Allison, Sean and me in Paris on Monday, November 3rd, in the afternoon (schedule link)

If wishes were fishes, OpenStack’s rapid developer and user rise would include graceful process and commercial transitions too.  As a Foundation board member, it’s my responsibility to help ensure that we’re building a sustainable ecosystem for the project.  That’s a Goldilock’s challenge because adding either too much or too little controls and process will harm the project.

In discussions with the community, that challenge seems to breaks down into three key questions:

After last summit, a few of us started a dialog around Hidden Influencers that helps to frame these questions in an actionable way.  Now, it’s time for us to come together and talk in Paris in the hallways and specifically on Monday, November 3rd, in the afternoon (schedule link).   From there, we’ll figure out about next steps using these three questions as a baseline.

If you’ve got opinions about these questions, don’t wait for Paris!  I’d love to start the discussion here in the comments, on twitter (@zehicle), by phone, with email or via carrier pidgins.


by Rob H at October 15, 2014 06:02 PM

Russell Bryant

OpenStack Instance HA Proposal

In a perfect world, every workload that runs on OpenStack would be a cloud native application that is horizontally scalable and fault tolerant to anything that may cause a VM to go down.  However, the reality is quite different.  We continue to see a high demand for support of traditional workloads running on top of OpenStack and the HA expectations that come with them.

Traditional applications run on top of OpenStack just fine for the most part.  Some applications come up with availability requirements that a typical OpenStack deployment will not provide automatically.  If a hypervisor goes down, there is nothing in place that tries to rescue VMs that were running there.  There are some features in place that allow manual rescue, but it requires manual intervention from a cloud operator or an external orchestration tool.

This proposal discusses what it would take to provide automated detection of a failed hypervisor and the recovery of the VMs that were running there.  There are some differences to the solution based on what hypervisor you’re using.  I’m primarily concerned with libvirt/KVM, so I assume that for the rest of this post.  Except where libvirt is specifically mentioned, I think everything applies just as well to the use of the xenserver driver.

This topic is raised on a regular basis in the OpenStack community.  There has been pushback against putting this functionality directly in OpenStack.  Regardless of what components are used, I think we need to provide an answer to the question of how this problem should be approached.  I think this is quite achievable today using existing software.

Scope

This proposal is specific to recovery from infrastructure failures.  There are other types of failures that can affect application availability.  The guest operating system or the application itself could fail.  Recovery from these types of failures is primarily left up to the application developer and/or deployer.

It’s worth noting that the libvirt/KVM driver in OpenStack does contain one feature related to guest operating system failure.  The libvirt-watchdog blueprint was implemented in the Icehouse release of Nova.  This feature allows you to set the hw_watchdog_action property on either the image or flavor.  Valid values include poweroff, reset, pause, and none.  When this is enabled, libvirt will enable the i6300esb watchdog device for the guest and will perform the requested action if the watchdog is triggered.  This may be a helpful component of your strategy for recovery from guest failures.

Architecture

A solution to this problem requires a few key components:

  1. Monitoring – A system to detect that a hypervisor has failed.
  2. Fencing - A system to fence failed compute nodes.
  3. Recovery – A system to orchestrate the rescue of VMs from the failed hypervisor.

Monitoring

There are a two main requirements for the monitoring component of this solution.

  1. Detect that a host has failed.
  2. Trigger an automatic response to the failure (Fencing and Recovery).

It’s often suggested that the solution for this problem should be a part of OpenStack.  Many people have suggested that all of this functionality should be built into Nova.  The problem with putting it in Nova is that it assumes that Nova has proper visibility into the health of the infrastructure that Nova itself is running on.  There is a servicegroup API that does very basic group membership.  In particular, it keeps track of active compute nodes.  However, at best this can only tell you that the nova-compute service is not currently checking in.  There are several potential causes for this that would still leave the guest VMs running just fine.  Getting proper infrastructure visibility into Nova is really a layering violation.  Regardless, it would be a significant scope increase for Nova, and I really don’t expect the Nova team to agree to it.

It has also been proposed that this functionality be added to Heat.  The most fundamental problem with that is that a cloud user should not be required to use Heat to get their VM restarted if something fails.  There have been other proposals to use other (potentially new) OpenStack components for this.  I don’t like that for many of the same reasons I don’t think it should be in Nova.  I think it’s a job for the infrastructure supporting the OpenStack deployment, not OpenStack itself.

Instead of trying to figure out which OpenStack component to put it in, I think we should consider this a feature provided by the infrastructure supporting an OpenStack deployment.  Many OpenStack deployments already use Pacemaker to provide HA for portions of the deployment.  Historically, there have been scaling limits in the cluster stack that made Pacemaker not an option for use with compute nodes since there’s far too many of them.  This limitation is actually in Corosync and not Pacemaker itself.  More recently, Pacemaker has added a new feature called pacemaker_remote, which allows a host to be a part of a Pacemaker cluster, without having to be a part of a Corosync cluster.  It seems like this may be a suitable solution for OpenStack compute nodes.

Many OpenStack deployments may already be using a monitoring solution like Nagios for their compute nodes.  That seems reasonable, as well.

Fencing

To recap, fencing is an operation that completely isolates a failed node.  It could be IPMI based where it ensures that the failed node is powered off, for example.  Fencing is important for several reasons.  There are many ways a node can fail, and we must be sure that the node is completely gone before starting the same VM somewhere else.  We don’t want the same VM running twice.  That is certainly not what a user expects.  Worse, since an OpenStack deployment doing automatic evacuation is probably using shared storage, running the same VM twice can result in data corruption, as two VMs will be trying to use the same disks.  Another problem would be having the same IPs on the network twice.

A huge benefit of using Pacemaker for this is that it has built-in integration with fencing, since it’s a key component of any proper HA solution.  If you went with Nagios, fencing integration may be left up to you to figure out.

Recovery

Once a failure has been detected and the compute node has been fenced, the evacuation needs to be triggered.  To recap, evacuation is restarting an instance that was running on a failed host by moving it to another host.  Nova provides an API call to evacuate a single instance.  For this to work properly, instance disks should be on shared storage.  Alternatively, they could all be booted from Cinder volumes.  Interestingly, the evacuate API will still run even without either of these things.  The result is just a new VM from the same base image but without any data from the old one.  The only benefit then is that you get a VM back up and running under the same instance UUID.

A common use case with evacuation is “evacuate all instances from a given host”.  Since this is common enough, it was scripted as a feature in the novaclient library.  So, the monitoring tool can trigger this feature provided by novaclient.

If you want this functionality for all VMs in your OpenStack deployment, then we’re in good shape.  Many people have made the additional request that users should be able to request this behavior on a per-instance basis.  This does indeed seem reasonable, but poses an additional question.  How should we let a user indicate to the OpenStack deployment that it would like its instance automatically recovered?

The typical knobs used are image properties and flavor extra-specs.  That would certainly work, but it doesn’t seem quite flexible enough to me.  I don’t think a user should have to create a new image to mark it as “keep this running”.  Flavor extra-specs are fine if you want this for all VMs of a particular flavor or class of flavors.  In either case, the novaclient “evacuate a host” feature would have to be updated to optionally support it.

Another potential solution to this is by using a special tag that would be specified by the user.  There is a proposal up for review right now to provide a simple tagging API for instances in Nova.  For this discussion, let’s say the tag would be automatic-recovery.  We could also update the novaclient feature we’re using with support for “evacuate all instances on this host that have a given tag”.  The monitoring tool would trigger this feature and ask novaclient to evacuate a host of all VMs that were tagged with automatic-recovery.

Conclusions and Next Steps

Instance HA is clearly something that many deployments would like to provide.  I believe that this could be put together for a deployment today using existing software, Pacemaker in particular.  A next step here is to provide detailed information on how to set this up and also do some testing.

I expect that some people might say, “but I’m already using system Foo (Nagios or whatever) for monitoring my compute nodes”.  You could go this route, as well.  I’m not sure about fencing integration with something like Nagios.  If you skip the use of fencing in this solution, you get to keep the pieces when it breaks.  Aside from that, your monitoring system could trigger the evacuation functionality of novaclient just like Pacemaker would.

Some really nice future development around this would be integration into an OpenStack management UI.  I’d like to have a dashboard of my deployment that shows me any failures that have occurred and what responses have been triggered.  This should be possible since pcsd offers a REST API (WIP) that could export this information.

Lastly, it’s worth thinking about this problem specifically in the context of TripleO.  If you’re deploying OpenStack with OpenStack, should the solution be different?  In that world, all of your baremetal nodes are OpenStack resources via Ironic.  Ceilometer could be used to monitor the status of those resources.  At that point, OpenStack itself does have enough information about the supporting infrastructure to perform this functionality.  Then again, instead of trying to reinvent all of this in OpenStack, we could just use the more general Pacemaker based solution there, as well.


by russellbryant at October 15, 2014 05:18 PM

Rafael Knuth

Recording, Slides and Podcast: What is Trove, the Database as a Service on OpenStack?

In case you missed our recent Online Meetup: Recording Slides Podcast Don’t miss any of our...

October 15, 2014 03:56 PM

Kashyap Chamarthy

LinuxCon/KVMForum/CloudOpen Eu 2014

While the Linux Foundation’s colocated events (LinuxCon/KVMForum/CloudOpen, Plumbers and a bunch of others) are still in progress (Düsseldorf, Germany), thought I’d quickly write a note here.

Some slides and demo notes on managing snapshots/disk image chains with libvirt/QEMU. And, some additional examples with a bit of commentary. (Thanks to Eric Blake, of libvirt project, for reviewing some of the details there.)


by kashyapc at October 15, 2014 02:47 PM

IBM OpenStack Team

Dive into the software-defined network ocean with OpenStack

By John M. Tracey and Ramesh Menon

One area where the cloud can be particularly useful is in simplifying the traditionally onerous process of provisioning and configuring network infrastructure and services. Many people wonder though how they can begin to utilize the cloud for networking. In this post, we describe an easy way to get started with a software-defined network (SDN) using OpenStack.

Businesses want all the benefits of the existing cloud infrastructure—agility, openness and elasticity—without any constraints. They need the speed of bare metal servers, dynamic provisioning capabilities, high data integrity, fast data recovery and instant multitenant scaling without boundaries.

High user expectations also require new levels of visibility and control to instantly fix latency and broken applications. Clients need new ways to manage capital expenditure and operational cost and maximize returns on invested capital. One area where these needs are particularly acute is the network. This has led to a flurry of development of new network capabilities across cloud providers.

SDN diagram

OpenStack Neutron provides a network application programming interface (API) that defines a rich set of constructs and operations for the cloud. One benefit of Neutron is that, while it defines the API, it allows the implementation to be provided by a separate plug-in. This allows OpenStack users to benefit from open standard interfaces while availing themselves of industry-leading implementations, without any vendor lock-in.

For example, many OpenStack users employ the Open vSwitch Neutron plug-in for a purely open source implementation. Enterprise users may be more inclined to utilize a commercial implementation such the IBM SDN for Virtual Environments (SDN VE). This particular product offers the benefit of integrating with both OpenStack and VMware.

One key benefit of the Neutron plug-in approach is that users of one plug-in do not need to learn anything new when they utilize another. So you can start with Open vSwitch, which is available at no charge, and transfer your knowledge from that experience if you decide to move to a commercial implementation such as SDN VE.

You can use an SDN to create secure cloud enclaves and dynamic networks to enhance your cyber security posture and overall system assurance levels. As the technology continues to evolve, IBM and other cloud providers will likely have innovative SDN capabilities to take cloud computing to next level.

We invite you to share your thoughts on OpenStack networking, particularly if you have questions or experience using one or more Neutron plug-ins. Please continue the discussion below or reach us on Twitter @RMenon_CTO and @JMTracey

The post Dive into the software-defined network ocean with OpenStack appeared first on Thoughts on Cloud.

by IBM Cloud Staff at October 15, 2014 01:11 PM

Alessandro Pilotti

OpenStack on Hyper-V – Icehouse 2014.1.3 – Best tested compute component?

Releasing stable components of a large cloud computing platform like OpenStack is not something that can be taken lightheartedly, there are simply too many variables and moving parts that need to be taken in consideration.

The OpenStack development cycle includes state of the art continuous integration testing including a large number of 3rd party CI testing infrastructures to make sure that any new code contribution won’t break the existing codebase.

The OpenStack on Hyper-V 3rd party CI is currently available for Nova and Neutron (with Cinder support in the works and more projects along the way), spinning up an OpenStack cloud with Hyper-V nodes for every single new code patchset to be tested, meaning hundreds of clouds deployed and dismissed per day. It’s hosted by Microsoft and maintained by a team composed by Microsoft and Cloudbase Solutions engineers.

This is a great achievement, especially when cnsidered in the whole OpenStack picture, where dozens of other testing infrastructures operate in a similar way while hundreds of developers tirelessly submit code to be reviewed. Thanks to this large scale joint effort, QA automation has surely been taken to a whole new level.

Where’s the catch?

There’s always a tradeoff between the desired workload and the available resources. In an ideal world, we would test every possible use case scenario, including all combinations of supported operating systems and component configurations. The result would simply require too many resources or execution times in the order of a few days. Developers and reviewers need to know if the code passed tests, so long test execution times are simply detrimental for the project. A look at the job queue shortly before a code freeze day will give a very clear idea of what we are talking about :-).

On the other side, stable releases require as much testing as possible, especially if you plan to sleep at night while your customers deploy your products in production environments.

To begin with, the time constraint that continuous integration testing requires disappear, since in OpenStack we have a release every month or so and this leads us to:

Putting the test scenarios together

We just need a matrix of operating systems and project specific options combinations that we want to test. The good news here are that the actual tests to be performed are the same ones used for continuous integration (Tempest), simply repeated for every scenario.

For the specific Hyper-V compute case, we need to test features that the upstream OpenStack CI infrastructure cannot test. Here’s a quick rundown list:

  • Every supported OS version: Hyper-V 2008 R2, 2012, 2012 R2 and vNext.
  • Live migration, which requires 2 compute servers per run
  • VHD and VHDX images (fixed, dynamic)
  • Copy on Write (CoW) and full clones 
  • Various Neutron network configurations: VLAN, flat and soon Open vSwitch!
  • Dynamic / fixed VM memory
  • API versions (v1, v2)
  • A lot more coming with the Kilo release: Hyper-V Generation 2 VMs, RemoteFX, etc

 


Downstream bug fixes and features

Another reason for performing additional tests is that “downstream” product releases, integrate the “upstream” projects (the ones available on the Launchpad project page and related git repositories) with critical bug fixes not yet merged upstream (time to land a patch is usually measured in weeks) and optionally new features backported from subsequent releases.

For example the OpenStack Hyper-V Icehouse 2014.1.3 release includes the following additions:

Nova

  • Hyper-V: cleanup basevolumeutils
  • Hyper-V: Skip logging out in-use targets
  • Fixes spawn issue on Hyper-V
  • Fixes Hyper-V dynamic memory issue with vNUMA
  • Fixes differencing VHDX images issue on Hyper-V
  • Fixes Hyper-V should log a clear error message
  • Fixes HyperV VM Console Log
  • Adds Hyper-V serial console log
  • Adds Hyper-V Compute Driver soft reboot implementation
  • Fixes Hyper-V driver WMI issue on 2008 R2
  • Fixes Hyper-V boot from volume live migration
  • Fixes Hyper-V volume discovery exception message
  • Add differencing vhdx resize support in Hyper-V Driver
  • Fixes Hyper-V volume mapping issue on reboot
  • HyperV Driver – Fix to implement hypervisor-uptime

Neutron

  • Fixes Hyper-V agent port disconnect issue
  • Fixes Hyper-V 2008 R2 agent VLAN Settings issue
  • Fixes Hyper-V agent stateful security group rules

Ceilometer

  • No changes from upstream

Running all the relevant integration tests against the updated repositories provides an extremely important proof for our users that the quality standards are well respected.

Source code repositories:

Packaging

Since we released the first Hyper-V installer for Folsom we had a set goals:

  • Easy to deploy
  • Automated configuration
  • Unattended installation
  • Include a dedicated Python environment
  • Easy to automate with Puppet, Chef, SaltStack, etc
  • Familiar for Windows users
  • Familiar for DevOps
  • Handle required OS configurations (e.g. create VMSwitches)
  • No external requirements / downloads
  • Atomic deployment

The result is the Hyper-V OpenStack MSI installer that keeps getting better with every release:

 

Sharing the test results

Starting with Icehouse 2014.1.3 we decided to publish the test results and the tools that we use to automate the tests execution:

Test results

http://www.cloudbase.it/openstack-hyperv-release-tests-results

Each release contains a subfolder for every test execution (Hyper-V 2012 R2 VHDX, Hyper-V 2012 VHD, etc), which in turn will contain the results in HTML format and every possible log, configuration file, list of applied Windows Update hot fixes, DevStack logs and so on.

Test tools

All the scripts that we are using are available here:

https://github.com/cloudbase/openstack-hyperv-release-tests

The main goal is to provide a set of tools that anybody can use efficiently with minimum hardware requirements and reproduce the same tests that we run (see for example the stack of Intel NUCs above).

Hosts:

  • Linux host running Ubuntu 12.04 or 14.04
  • One or more Hyper-V nodes

Install the relevant prerequisites on the Linux node.

Enable WinRM with HTTPS on the Hyper-V nodes.

Edit config.yaml, providing the desired Hyper-V node configurations and run:

./run.sh https://www.cloudbase.it/downloads/HyperVNovaCompute_Icehouse_2014_1_3.msi stable/icehouse

The execution can be easily integrated with Jenkins or any other automation tool:

Screen Shot 2014-10-15 at 02.32.12

Run with custom parameters, for testing individual platforms:

We are definitely happy with the way in which Hyper-V support in OpenStack is growing. We are adding lots of new features and new developers keep on joining the ranks, so QA became an extremely important part of the whole equation. Our goal is to keep the process open so that anybody can review and contribute to our testing procedures for both the stable releases and the master branch testing executed on the Hyper-V CI infrastructure.

The post OpenStack on Hyper-V – Icehouse 2014.1.3 – Best tested compute component? appeared first on Cloudbase Solutions.

by Alessandro Pilotti at October 15, 2014 12:01 PM

Tesora Corp

Short Stack: OpenStack Juno Release Candidate 2, EMC grabs Cloudscaling and OpenStack makes smarter data centers

short stack_b small_0_0.jpgWelcome to the Short Stack, our weekly feature where we search for the most intriguing OpenStack links to share with you. These links may come from traditional publications or company blogs, but if it's about OpenStack, we'll find the best links we can to share with you every week.

If you like what you see, please consider subscribing.

Here we go with this week's links:

EMC Acquiring Cloud-Computing Startup Cloudscaling | Bloomberg

EMC became the latest large computer company to buy an OpenStack cloud startup when it bought Cloudscaling yesterday. The startup gives EMC's cloud business a way into the growing OpenStack ecosystem and for a company like EMC that's struggling to define itself in a changing market that's likely a good move.

How OpenStack integration can create smarter cloud data centres | ITWire

OpenStack provides an operating layer for the modern data center connecting public and private clouds and providing a stack of services including security, database, storage and so forth, which gives IT pros the means to manage an increasingly complex environment in a more intelligent way.

Interview with Mark Voekler of Cisco on OpenStack | Opensource.com

As the head of the team in charge of understanding what it takes to build a successful OpenStack project at Cisco, Voekler is in a unique position to talk about the challenges involved and he plans to give a talk on the subject at an upcoming community meeting. In this interview he talks about his job, his upcoming talk, the state of the community and more.

OpenStack Juno Races to Completion as RC2s are Released | InternetNews

It seems like we just got Ice House, but time waits for no one, not even OpenStack and the community has been hard at work on the next version of the project, dubbed Juno. Just last week, Release Candidate 2 hit the streets and it won't be long before Juno is ready for wider distribution.

OpenStack is nowhere near a "solved problem" | TechRepublic

When you live on the cutting edge, it's easy to believe that everyone understands what you get already, but this writer says OpenStack founder Josh McKenty whom he says, "bleeds OpenStack" was probably jumping the gun when he said he joined Pivotal to work on Cloud Foundry because OpenStack is too mature for his taste. There is still a lot of work to be done.

by 693 at October 15, 2014 11:30 AM

Percona

Rackspace doubling-down on open-source databases, Percona Server

Rackspace doubling-down on OpenStack TroveFounded in 1998, Rackspace has evolved over the years to address the way customers are using data – and more specifically, databases. The San Antonio-based company is fueling the adoption of cloud computing among organizations large and small.

Today Rackspace is doubling down on open source database technologies. Why? Because that’s where the industry is heading, according to Sean Anderson, Manager of Data Services at Rackspace. The company, he said, created a separate business unit of 100+ employees focused solely on database workloads.

The key technologies under the hood include both relational databases (e.g., MySQL, Percona Server, and MariaDB) and NoSQL databases (e.g., MongoDB, Redis, and Apache Hadoop).

Last July Rackspace added support for Percona Server and MariaDB to their Cloud Databases DBaaS (Database-as-a-Service) product, primarily at the request of application developers who had been requesting more open source database support options.

Matt Griffin, Percona director of product management, and I recently sat down with Sean and his colleague Neha Verma, product manager of Cloud Databases. Our discussion focused on the shift to DBaaS as well as what to expect in the future from Rackspace in terms of Cloud Databases, OpenStack Trove and more.

* * *

Matt: Why did you expand the Cloud Databases product this past summer?
Sean:  We launched cloud databases about a year and a half ago. Since then we’ve rolled feature after feature (backups, monitoring, configuration management, etc…) focused on simplifying our customers life, this backed by Fanatical support has made the product easier to use and more production ready than ever. We understand that features aren’t enough so in addition to all the features we have also made significant improvements to the hardware and network infrastructure. All this means that we’ve been very busy not just expanding the offering but also making the offering simpler to use, more complete and more scalable.

Our vision is to offer a robust platform that with the most popular Big Data, SQL, and NoSQL databases on dedicated, bare metal, and public cloud infrastructure.

Matt: What type of customer is your Cloud Databases offering aimed at?
Sean: Currently we have a variety of customers running multiple Cloud Database instances ranging from customers running a two-month marketing campaign to customers running web applications, ecommerce applications with highly transactional database workloads. Our customers prefer the simplicity and reliability of the service which allows them to focus on their business and not worry about the heavy lifting associated with scaling and managing databases.

Matt: How is your Cloud Databases offering backed-up?
Neha: We use Percona XtraBackup  to perform a hot copy of all databases on a instance and then stream the backups to Cloud Files for storage. A customer can anytime restore the backup to a new instance. Percona XtraBackup is the only option we offer customers right now.

Tom: In terms of security, how do you address customer concerns? Are cloud-based open source databases more secure?
Sean: Data security concerns are at an all-time high and we have a number of up and coming features that continue to address those concerns.   Today we offer a number of unique features specifically Cloud Databases can only be accessed on the private network so the database can only be accessed by systems on your private network. Additionally, we support SSL for communication between user application and database instance so that any data transfer is encrypted in transit.  These features along with the built in user controls and authentication mechanisms help significantly address customers security concerns.  Ultimately Cloud-based open source databases or no more or less secure than any other database, security is about more than features it is about the process and people that build and manage your database and we have those more than covered.

Matt: Is this for production applications or pre-production?
Sean: It’s very much production capable. While there’s a perception that this type of offering would only fit for use cases around test or dev, the truth is we are running hundreds of very large, fully managed instances of MySQL on the cloud. We don’t make any delineation between production or pre-production. However, we’re definitely seeing more and more production workloads come onto the service as people are getting educated on the development work that we’ve done around adding these new features. Replication and monitoring are the two most popular right now.

Matt: How are people accessing and using it?
Sean: A majority of our users either access the database via the Control Panel, API or a command-line utility.

Matt: Since the launch, how has the reaction been?
Sean: The reaction from the press standpoint has been very positive. When we talk with industry analysts they see our commitment to open source and where we are going with this.

OpenStack_PerconaTom: How committed is Rackspace to OpenStack?
Sean: We all live in OpenStack. We have tons of Rackers heading to the upcoming OpenStack Paris Summit in November. We’re looking forward to many years of contributing to the OpenStack community.

Tom: Last April, Rackspace hosted several sessions on OpenStack and Trove at the Percona Live MySQL Conference and Expo 2014 in Santa Clara, Calif. What are you looking forward to most at Percona Live 2015?
Sean: For us, Percona Live is about listening to the MySQL community. It’s our best opportunity each year to actually setup shop and get to learn what’s top of mind for them. We then can take that information and develop more towards that direction.

Tom: And as you know we’re also launching “OpenStack Live” to run parallel to the Percona Live MySQL conference. OpenStack Live 2015 runs April 13-14 and will emphasize the essential elements of making OpenStack work better with emphasis on the critical role of MySQL and the value of Trove. I look forward to hearing the latest news from Rackspace at both events.

Thanks Sean and Neha for speaking with us and I look forward to seeing you this coming April in Santa Clara at Percona Live and OpenStack Live!

On a related note, I’ll also be attending Percona Live London (Nov. 3-4) where we’ll have sessions on OpenStack Trove and everything MySQL. If you plan on attending, please join me at the 2014 MySQL Community Dinner (pay-your-own-way) on Nov. 3. (Register here to reserve your spot at the Community Dinner because space will be limited. You do not need to attend Percona Live London to join the dinner).

The post Rackspace doubling-down on open-source databases, Percona Server appeared first on MySQL Performance Blog.

by Tom Diederich at October 15, 2014 07:00 AM

October 14, 2014

IBM OpenStack Team

IBM technical sessions at the OpenStack Summit in Paris

I’m getting ready to attend my seventh OpenStack Summit. I’ve been to every one since the Essex Summit in Boston, and I know how daunting it is to look at the schedule and try to pick what sessions to attend. Does the session description catch me? Does it sound like a vendor pitch? Do the speakers have a good reputation? All those questions matter to me—and likely they matter to you too.

OpenStack SummitTo help make the scheduling decisions a little easier, or at least reduce the number of slots that you need to fill, I wanted to highlight the IBM sessions that were selected by the community. I’ve provided some information on who is presenting them, and what you can expect.

Monday, November 3

11:40
IPv6 Features in OpenStack Juno
Xu Han Peng, joint with Comcast and Cisco
Xu Han, a technical lead on the IBM Cloud Manager with OpenStack project, has teamed up with Comcast and Cisco to give an overview of the IPv6 support in the Juno release as well as what is planned for the upcoming Kilo release. Expect a presentation deep on content; Xu Han has been responsible for IPv6 support as part of IBM OpenStack product efforts for the last two years.

15:20
When Disaster Strikes the Cloud: Who, What, When, Where and How to Recover

Michael Factor and Ronen Kat, joint with Red Hat
Two leading cloud storage researchers at IBM, Distinguished Engineer Michael Factor and Ronen Kat, along with Red Hat, will present how to leverage the basic building blocks of OpenStack to survive a disaster and what is coming in the Kilo release that will simplify the process. The presentation should be rooted in reality and examples. How do I know? They will be running a demo at the IBM booth!

15:20
Why is my Volume in ‘ERROR’ State!?! An Introduction to Troubleshooting Your Cinder Configuration

Jay Bryant
Jay Bryant, Cinder core contributor, will guide users through getting volumes into the “AVAILABLE” state, whether bringing up new Cinder installations or adding additional storage back ends to an existing Cinder installation. Jay will address the most common pitfalls. In addition to being a Cinder core, he is also the resident Cinder subject matter expert at IBM and is consulted on most issues related to Cinder

16:20
Group Based Policy Extension for Networking
Mohammad Banikazemi joint with Cisco, One Convergence
Mohammad, an IBM Research staff member, is teaming up with three other Neutron contributors to introduce the new Group Policy Extension that facilitates clear separation of concerns between the application and infrastructure administrator and to demo the latest version of the code end-to-end. The demo should highlight the best features of this extension. Mohammad authored the IBM software-defined network for virtual environments (SDN VE) Group Policy extension and understands the capabilities as well as anyone.

17:50
Practical Advice on Deployment and Management of Enterprise Workloads
Jarek Miszczyk, Venkata Jagana
Jarek is responsible for working with IBM teams building on OpenStack, and Venkata is responsible for influencing the IBM Global Technology Services Software Defined Environments initiative. Together, they’ll be sharing best practices for using Heat and HOT templates to deploy and manage more traditional enterprise workloads. Expect well-tested and proven advice; Jarek and Venkata have been building advanced Heat and HOT Templates to orchestrate complex deployments for almost a year.

Tuesday, November 4

11:15
The Perfect Match: Apache Spark Meets Swift
Gil Vernik and Michael Factor joint with Databricks
Michael Factor is back on the second day with Gil Vernik, another leading cloud storage researcher at IBM. They’ll be joined by the co-founder of Databricks, recognized as one of the leading Apache Spark companies, to introduce this emerging project, its integration with Swift and to demonstrate some advanced models. You should walk away with an understanding of what has driven Spark’s rapid ascension; Databricks has built its platform (and company) on the technology.

14:50
A Practical Approach to Dockerizing OpenStack High Availability
Manuel Silveyra, Shaun Murakami, Kalonji Bankole, Daniel Krook

IBM Cloud Labs experts, the leading OpenStack deployment and operations team at IBM, will take you through their work to improve OpenStack High Availability (HA) by moving the OpenStack services into Docker containers, ultimately showing it in action. Expect to learn whether deploying OpenStack in Docker to improve HA is practical in your environment. As a team of “do-ers,” they will deliver the content from an operator’s practical point of view, not a theoretical one.

15:40
Docker Meets Swift: A Broadcaster’s Experience
Eran Rom joint with Radiotelevisione Italiana (RAI)
Eran, another member of the IBM cloud storage research team from Haifa Labs, will join with a member of Italy’s national public broadcasting company to show how RAI addressed their problem of growing costs from new storage-hungry media formats using the Haifa-developed “storlets” concept that combines Docker with Swift. These experts will explain what storlets are and how they solve real-world problems. Eran was part of the core development team of the technology and helped RAI get it into a first-of-a-kind deployment.

16:40
User Group Panel: India, Japan, China
Guang Ya Liu, joint with six other OpenStack community organizers
Tom Fifield, Community Manager for the OpenStack Foundation, will moderate this discussion of OpenStack community organizers that includes Guang Ya Liu, co-organizer of the OpenStack Xi’an (China) Meet Up and Cinder core contributor. Expect to walk away knowing how to find a local OpenStack community or start your own, and you’ll get some good advice for keeping momentum in your group. All six panelists have either started a local user group, meet-up or conference, or have presented to these groups on many occasions.

Wednesday, November 5

9:00
Monasca DeepDive: Monitoring at Scale
Tong Li and Rob Basham joint with HP and Rackspace
Tong is one of the earliest OpenStack contributors from IBM, and Rob is focused on monitoring and automation in the systems management space. They join with long-time OpenStack contributor Sandy Walsh of Rackspace and Roland Hochmuth of HP to introduce a new cloud-scale, multitenant, high performance, fault-tolerant OpenStack monitoring as a service (MONaaS) platform project called Monasca. You’ll see a project that is ready to use today and solves a real challenge; Sandy’s original StackTach project really set the bar for this type of work in OpenStack.

9:00
Managing Multi-platform Environments with OpenStack
Shaun Murakami and Philip Estes
The IBM Cloud Labs team is back to take you through lessons learned deploying and managing OpenStack in a heterogeneous environment running disparate workloads, including touching on challenges such as regions and federated identities. Again, expect practical advice for the operator—this is a team of “do-ers.”

9:50
Troubleshooting Problems in Heat Deployments
Fabio Oliveira, Ton Ngo, Priya Nagpurkar, Winnie Tsang
This is the third session with members of the IBM Cloud Labs team, this time pairing up with two IBM Research staff members to talk through the ad hoc methods by which OpenStack Heat failures were troubleshot in the past, recent improvements that will make those methods easier and what is coming in the future. You will likely see some innovative user-centric approaches to simplifying Heat troubleshooting, including a demo of a template debugger inspired by traditional programming language debuggers.

11:50 Keystone to Keystone Federation Enhancements for Hybrid Cloud Enablement
Brad Topol and Steve Martinelli joint with Rackspace and CERN
Two top Keystone experts from IBM—Distinguished Engineer Brad Topol and Keystone Core Steve Martinelli—are partnering with CERN and Rackspace to describe the recently contributed Keystone-to-Keystone federated identity enhancements to support hybrid cloud scenarios. This session should provide a healthy dose of real-world, customer driven use cases, as CERN is the original OpenStack superuser.

I hope this guide helps make a few decisions a little easier. If you still have questions on any of these sessions, find me on Twitter @mjfork.

The post IBM technical sessions at the OpenStack Summit in Paris appeared first on Thoughts on Cloud.

by Michael J. Fork at October 14, 2014 04:59 PM

Adam Young

Who can sign for what?

In my last post, I discussed how to extract the signing information out of a token. But just because the signature on a document is valid does not mean that the user who signed it was authorized to do so. How can we got from a signature to validating a token? Can we use that same mechanism to sign other OpenStack messages?

The following is a proposed extension to Keystone client based on existing mechanisms.

Overview

  1. Extract signer data out of the certificates
  2. Fetch the compete list of certificate from Keystone using the OS-SIMPLE-CERT extension
  3. Match the signer to the cert to validate the signature and extract the domain data for the token
  4. Fetch the mapping info from the Federation extension
  5. Use the mapping info to convert from the signing cert to a keystone user and groups
  6. Fetch the effective roles from Keystone for the user/groups for that domain
  7. Fetch policy from Keystone
  8. Execute the policy check to validate that the signer could sign for the data.

We need a method to go from the certificate used to sign the document to a valid Keystone user. Implied in there is that everything signed in an OpenStack system is going to be signed by a Keystone user. This is an expansion on how things were done in the past, but there is a pretty solid basis for this approach: in Kerberos, everything is a Principal, whether user or system.

From Tokens to Certs

The Token has the CMS Signer Info.  We can extract that information as previously shown.

The OS-SIMPLE-CERT extension has an API for fetching all of the signing certs as once:

This might not scale greatly, it is sufficient for supporting a proof-of-concept.  It reduces the problem of “how to find the cert for this token”  down to a match between the signing info and the attributes of the certificates.

To extract the data from the certificates, We can Popen the OpenSSL command to validate a certificate.  This is proper security practice anyway, as, while we trust the authoritative Keystone, we should verify whenever possible.  It will be expensive, but this result can be cached and reused, so it should not have to happen very often.

From Certs to Users

To translate from a certificate to a user, we need to first parse the data out of the certificate. This is possible doing a call to OpenSSL. We can be especially efficient by using that call to validate the certificate itself, and then converting the response to a dictionary. Keystone already has a tool to convert a dictionary to the Identity objects (user and groups): the mapping mechanism in the Federation backend. Since a mapping is in a file we can fetch, we do not need to be inside the Keystone server to process the mapping, we just need to use the same mechanism.

Mappings

The OS-FEDERATION extension has an API to List all mappings.

And another to get each mapping.

Again, this will be expensive, but it can be cached now, and optimized in the future.

The same process that uses the mappings to translate the env-vars for an X509 certificate  to a user inside the Keystone server can be performed externally. This means extracting code from the Federation plugin of the Keystone server to python-keystoneclient.

From User to Roles

Once we have the users and groups, we need to get the Role data appropriate to the token. This means validating the token, and extracting out the domain for the project. Then we will use the Identity API to list effective role assignments

We’ll probably have to call this once for the user ID and then once for each of the groups from the mapping in order to get the full set of roles.

From Roles to Permission

Now, how do we determine if the user was capable of signing for the specified token? We need a policy file. Which one? The one abstraction we currently have is that a policy file can be associated with an endpoint. Since keystone is responsible for controlling the signing of tokens, the logical endpoint is the authoritative keystone server where we are getting the certificates etc:

We get the effective policy associated with the keystone endpoint using the policy API.

And now we can run the users RBAC through the policy engine to see if they can sign for the given token.  The policy engine is part of oslo common.  There is some “flattening” code from the Keystone server we will want to pull over.  But of these will again land in python-keystoneclient.

Implementation

This is a lot of communication with Keystone, but it should not have to be done very often: once each of these API calls have been made, the response can be cached for a reasonable amount of time. For example, a caching rule could say that all data is current for a minimum of 5 minutes. After that time, if a newly submitted token has an unknown signer info, the client could refetch the certificates. The notification mechanism from Keystone could also be extended to invalidate the cache of remote clients that register for such notifications.

For validating tokens in remote endpoints, the process will be split between python-keystoneclient and keystonemiddleware.  The Middleware piece will be responsible for cache management and maintaining the state of the validation between calls.  Keystone Client will expose each of the steps with a parameter that allows the cached state to be passed in, as well as already exposing the remote API wrapping functions.

At this state, it looks like no changes will have to be made to the Keystone server itself.  This shows the power of the existing abstractions.  In the future, some of the calls may need optimization.   Of example, the fetch for certificates may need to be broken down into a call that fetches an individual certificate by its signing info.

by Adam Young at October 14, 2014 04:17 PM

Sébastien Han

See you at the CephDay London 2!

This second edition of the CephDay London looks really promising. You should definitely look at the agenda! Talks go from OpenStack to deep performance studies and crossing CephFS news!

Check this out on the Eventbrite page.


I hope to see you there! I don’t have any talks, at least for once I’ll be watching :-).

October 14, 2014 03:33 PM

The Official Rackspace Blog » OpenStack

Accelerating Science With OpenStack At Notre Dame

By Paul R. Brenner, Associate Director, Center for Research Computing at the University of Notre Dame

At the Center for Research Computing (CRC) at the University of Notre Dame we offer a heterogeneous high-performance computing environment where faculty, students and others can conduct scientific research. From simulating hurricanes to digging into the makeup of molecules, our researchers must have the computational resources to try new, inventive things, of which our infrastructure must be able to support.

For our researchers to be successful, our infrastructure must be quick and agile. Our scientists need to be able to write software and have it up and running in days versus the months or year it could take in a traditional enterprise environment. Our priority for agility and speed is much higher than an enterprise’s. Our scientists must have the computing resources to effectively explore theories and validate hypotheses.

We run a heterogeneous environment with about 1,600 to 1,750 users and about 1,500 servers, 21,000 cores and 2 PB of spinning disk storage. We connect out to the rest of the world over 10 gig links and via internet2 to national labs. We selected OpenStack and Rackspace Private Cloud for a key component in our research cloud because we need to be dynamically flexible.

For scientists to collaborate at Notre Dame and at universities around the world, we have open languages we all speak. Sharing that common language in a computing sense means having a common API like what OpenStack provides.

Our OpenStack cloud allows our scientists to spin up and spin down resources as their research requires. OpenStack gives our users a self-service model – they can run it themselves and create highly agile systems that they can bring up and bring down.

OpenStack also opens the door to federation between multiple universities. We are working to integrate universities and labs in Poland, and India, you name it, via an OpenStack cloud and share data.

OpenStack allows us to modify our cloud to fit into custom infrastructures that may be necessary for scientific research. As a member of the community, we can add features and capabilities – if we want to make a grid scheduler talk to the OpenStack API, we can do that. With proprietary platforms, you lose the option to customize.

With an OpenStack-powered Rackspace Private Cloud we found a robust platform for production environments. As a founder of OpenStack, Rackspace has cloud credibility and technical expertise in managing the OpenStack platform.

We did the comparisons, and even Rackspace’s biggest OpenStack competitor doesn’t offer the service and support model we demand. Fanatical Support is truly a key differentiator.

Want to hear more about how the Center for Research Computing at the University of Notre Dame uses OpenStack and Rackspace Private Cloud to accelerate scientific research? Associate Director Paul Brenner is speaking at Rackspace::Solve Chicago on Monday, October 20 at The Peninsula. Other companies scheduled to speak at Rackspace::Solve Chicago include Kendra Scott, Razorfish, Zipline Labs and more. The event is free and open to everyone. Register now.

by Paul R. Brenner at October 14, 2014 03:00 PM

October 13, 2014

ICCLab

Wanted: Network Function Virtualization Pro

Are you excited and have a big interest in Network Function Virtualisation (NFV), one of the big trends driving both industry and research communities? Do you believe that you could bring new innovations and ideas to this area in a driven exciting environment? Would you know how to use OpenStack or CloudStack to realise this architecture in a cloud-native way? If so this  position will certainly grab your attention.

Here at the ICCLab we are seeking someone to take on the world of NFV. The details of the position can be viewed here.

 

 

by Andy Edmonds at October 13, 2014 06:33 PM

Matt Farina

Connecting to OpenStack in node.js with pkgcloud

If you've ever wanted to build a node.js app that worked with OpenStack, HP Helion OpenStack, or the HP Public Cloud the package to use pkgcloud. This is a multi-cloud library for node.js that includes support some of the common components of OpenStack. It's supported enough by the OpenStack community that it's the node.js library listed on the OpenStack developer portal.

Note, until recently there was the hpcloud-js library that I've now deprecated because it is superseded by pkgcloud. hpcloud-js served us well but given the support for OpenStackn in pkgcloud and the superior feature coverage we've decided to direct users there instead.

Creating A Client

Let's look at an example where we create an object storage client. This client can be used to manage objects, containers, meta-data, and more.

var client = require('pkgcloud').storage.createClient({
    provider: 'openstack',
    username: 'your-user-name',
    password: 'your-password',
    tenantId: 'exampleProject',
    region: 'exampleRegion',
    authUrl: 'https://identity.example.com/v2.0/'
});

If you're connecting to the HP Public Cloud and would like to use your keys instead of a username and password the hp provider can be used.

Retrieval Example

Given the client in the previous example, we can retrieve information from the endpoint. For example, we can get a list of containers like so.

client.getContainers(function(err, containers) {
    // Containers is an array of container objects.
})

Examples and Documentation

There's a bit of documentation and examples in the pkgcloud docs. If you're looking to dive in it's worth starting there.

Continue Reading »

October 13, 2014 02:00 PM

Michael Still

One week of Nova Kilo specifications

Its been one week of specifications for Nova in Kilo. What are we seeing proposed so far? Here's a summary...

API



Administrative

  • Enable the nova metadata cache to be a shared resource to improve the hit rate: review 126705.


Containers Service



Hypervisor: FreeBSD

  • Implement support for FreeBSD networking in nova-network: review 127827.


Hypervisor: Hyper-V

  • Allow volumes to be stored on SMB shares instead of just iSCSI: review 102190.


Hypervisor: VMWare

  • Add ephemeral disk support to the VMware driver: review 126527 (spec approved).
  • Add support for the HTML5 console: review 127283.
  • Allow Nova to access a VMWare image store over NFS: review 126866.
  • Enable administrators and tenants to take advantage of backend storage policies: review 126547 (spec approved).
  • Support the OVA image format: review 127054.


Hypervisor: libvirt

  • Add a new linuxbridge VIF type, macvtap: review 117465.
  • Add support for SMBFS as a image storage backend: review 103203.
  • Convert to using built in libvirt disk copy mechanisms for cold migrations on non-shared storage: review 126979.
  • Support libvirt storage pools: review 126978.
  • Support quiesce filesystems during snapshot: review 126966.


Instance features

  • Allow direct access to LVM volumes if supported by Cinder: review 127318.


Interal

  • Move flavor data out of the system_metdata table in the SQL database: review 126620.


Internationalization



Scheduler

  • Add an IOPS weigher: review 127123 (spec approved).
  • Allow limiting the flavors that can be scheduled on certain host aggregates: review 122530.
  • Create an object model to represent a request to boot an instance: review 127610.
  • Decouple services and compute nodes in the SQL database: review 126895.
  • Implement resource objects in the resource tracker: review 127609.
  • Move select_destinations() to using a request object: review 127612.


Scheduling

  • Add instance count on the hypervisor as a weight: review 127871.


Security

  • Provide a reference implementation for console proxies that uses TLS: review 126958.
  • Strongly validate the tenant and user for quota consuming requests with keystone: review 92507.


Tags for this post: openstack kilo blueprints spec
Related posts: Compute Kilo specs are open; Blueprints to land in Nova during Juno; On layers; My candidacy for Kilo Compute PTL; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno Nova PTL Candidacy

Comment

October 13, 2014 11:27 AM

ICCLab

8th Swiss OpenStack and Docker User Group meeting – announcement

docker-logo-loggedout         chosug  icclab-logo

OpenStack User Group – Meeting, 16 Oct. at ICCLab Winterthur

Co-located with docker CH meeting

Goals: Analysis of OpenStack  Solutions, deployments and container solutions.

Hosting:

ZHAW Zurich University of Applied Science
Technikumstrasse 9, 8401 Winterthur
Data: 16.10.2013 –   18:00 – 21:00

Agenda  start: 18.00 –  ROOM TL203  (Chemistry Building)

(order of speakers may change)

– Intro & Welcome 5 mins (Florian & Antonio)
– Peter Mumenthaler – Puzzle ITC – “Docker, blessing or curse? (15m)
– Marco Kueding and Rolf Schaerer (Cisco CH) – “Intercloud and Cisco OpenStack strategy.  (35 m)
– Michael Erne, ZHAW ICCLab – “Manage Docker at scale with Kubernetes” (15m)

Drink break

– Jesper Kuhl, Nuage Networks & Alcatel Lucent  “VSP – Virtualized Services Platform” (25m)
– Srikanta Patanjali, ZHAW ICCLab – ” Updates on CYCLOPS – A Charging platform for OPenStack Clouds” (20m)
– Alexander Gabert, Cynthia, “Network Virtualization” (20m)

– Common Wrap up and apero

Looking forward to seeing you all!
Snacks and Drinks kindly offered by ZHAWMirantis

by Antonio Cimmino at October 13, 2014 08:27 AM

Opensource.com

Making the case for OpenStack, mentoring, and more

Interested in keeping track of what's happening in the open source cloud? Opensource.com is your source for what's happening right now in OpenStack, the open source cloud infrastructure project.

by Jason Baker at October 13, 2014 07:00 AM

Michael Still

Compute Kilo specs are open

From my email last week on the topic:
I am pleased to announce that the specs process for nova in kilo is
now open. There are some tweaks to the previous process, so please
read this entire email before uploading your spec!

Blueprints approved in Juno
===========================

For specs approved in Juno, there is a fast track approval process for
Kilo. The steps to get your spec re-approved are:

 - Copy your spec from the specs/juno/approved directory to the
specs/kilo/approved directory. Note that if we declared your spec to
be a "partial" implementation in Juno, it might be in the implemented
directory. This was rare however.
 - Update the spec to match the new template
 - Commit, with the "Previously-approved: juno" commit message tag
 - Upload using git review as normal

Reviewers will still do a full review of the spec, we are not offering
a rubber stamp of previously approved specs. However, we are requiring
only one +2 to merge these previously approved specs, so the process
should be a lot faster.

A note for core reviewers here -- please include a short note on why
you're doing a single +2 approval on the spec so future generations
remember why.

Trivial blueprints
==================

We are not requiring specs for trivial blueprints in Kilo. Instead,
create a blueprint in Launchpad
at https://blueprints.launchpad.net/nova/+addspec and target the
specification to Kilo. New, targeted, unapproved specs will be
reviewed in weekly nova meetings. If it is agreed they are indeed
trivial in the meeting, they will be approved.

Other proposals
===============

For other proposals, the process is the same as Juno... Propose a spec
review against the specs/kilo/approved directory and we'll review it
from there.


After a week I'm seeing something interesting. In Juno the specs process was new, and we saw a pause in the development cycle while people actually wrote down their designs before sending the code. This time around people know what to expect, and there are left over specs from Juno lying around. We're therefore seeing specs approved much faster than in Kilo. This should reduce the effect of the "pipeline flush" that we saw in Juno.

So far we have five approved specs after only a week.

Tags for this post: openstack kilo blueprints spec
Related posts: One week of Nova Kilo specifications; Blueprints to land in Nova during Juno; On layers; My candidacy for Kilo Compute PTL; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno Nova PTL Candidacy

Comment

October 13, 2014 12:39 AM

October 12, 2014

Jon Proulx

Should It Be Easy to Install OpenStack?

As part of my glorious position on the OpensStack User Committee (note to self submit bio for user committee page) I recently went through all 619 free response comments collected by the user survey in the past 12 months. The point of that exercise was to come up with a very coarse “Top 10” list to be presented with the tabulated survey results in Paris.

One of the things that kept coming up (though in the end finished out of the money so no spoilers here) was making it easier to install OpenStack. Certainly this is an area though could use improvement, but how much can it be improved and is it even OpenStack’s business to worry about installers?

When wearing my ‘user committee’ hat I had to be neutral and try and interpret the anonymous responses I was looking at as faithfully as possible without any editorializing. Well I’m takin that hat off now and putting on my more usual ‘grumpy old bastard’ hat, so hold on kids it’s going to be a bumpy ride…

You will never get a unicorn.

Your package manager cannot help you here, apt-get install openstack will never do what you want (nor will yum install or emerge or any of the other variants Linux or otherwise). There are simply too many pieces and too many choices for this to ever really be practical, so just get that idea out of your silly little head.

One of OpenStack’s strength is it’s flexibility and modularity. I’m of the opinion that this is the main reason it is currently dominant in the private cloud space while more monolithic projects (ahem Eucalyptus) have all but faded away. It’s certainly why I threw away my Eucalyptus based cloud about 2 years ago, but I digress.

Flexibility necessarily brings some complexity. Now I think choice is good, but perhaps you think things have gotten a bit out of hand. Perhaps we have gone too far in the trade off between flexibility and simplicity, but even if we could put that genie back in the bottle divide the available options by 10 and you’d still be there all day long answering preinstall questions.

The main problem isn’t even the millions of config options it’s how the setting those option affects what packages you need to install in the first place. Which services will you provide? Keystone, Nova, and Glance are pretty much required for an IaaS install (we’ll assume a different meta-package for a storage focused install what wouldn’t need those), but what about Cinder and if Cinder what storage backend? Even with the most minimal (and somewhat unrealistic) IaaS requirements, what hypervisor will you use, what storage will you use for Glance these both have a big impact on what packages you need. We’ll wave hands over the Neutron options because those are just crazy, but you do need to consider which RPC backend to use and I’d really like to see a choice of database backend as well.

“But choice doesn’t mean we can’t have reasonable defaults”, you say? OK that’s true. Suppose there’s a magic package that makes all theses choices and leave it to you to install by pieces if you want to make different choices.

Well that unicorn isn’t shooting rainbows out it’s ass anymore, but still a unicorn I’ll grant. Here’s the final kicker, OpenStack is fundamentally a distributed system that can be decomposed in multiple ways. Single controller, separate controller and network node are the two most popular will there be separate Cinder nodes or will that be on the controller?

Now maybe we can assume more defaults to simplify this, like a single non HA controller that will do everything and we can add compute nodes to talk to later. Well that isn’t a unicorn anymore it’s a toy. I’ve got nothing against playing with toys, they are useful for simple testing and experimental learning. If that’s what you’re looking for you probably want a pony, I’ll talk about those later.

But I saw one over there.

“That last argument was a straw man there are plenty of ways to get my unicorn that don’t rely on anything as crude as package managers.” Well that’s true there are better ways to try and get your unicorn, but I didn’t make up that argument. Someone actually wanted to avoid ‘kludging’ around with configuration management systems.

So yes can get better install automation by leveraging the configuration management system of your choice. In fact you’re a bit mad if you don’t, but this is not a unicorn either it’s just good sense and takes a deep understanding of OpenStack to configure properly. Perhaps less so than just installing raw as you can play a bit of “fill in the blanks” to get things going, but still somewhat less than magical.

TripleO? Yes, fine that may be magical. The OpenStack on OpenStack, OpenStack as a Service, infinitely recursive, inceptiony thing may in fact have enough mojo to do this in its perfected state. I haven’t been following just how close its current state is to this, but I will grant that it may grow up to be a unicorn if it isn’t already.

You don’t really want a unicorn anyway.

Well not at first.

OpenStack in production is probably going to be a fairly important part of your infrastructure. What is the point of a cloud if not to run many more systems on the cloud than under it?

For something with your whole computational world riding on it, you NEED to understand what the pieces are and how they are put together in a very deliberate way. When something goes wrong you need to be able to understand what is happening.

I’ve never met a piece of software that hasn’t broken on me at some point in some way. Software (and hardware for that matter) that is in the critical path demands either a commercial support contract with a very short response time or equivalent local expertise. Of course, being local expertise, I prefer the latter there are plenty of people in the ecosystem who will sell you the former as well.

Unicorns get you neither of these things. If there were a silver bullet, answer three questions and have a working OpenStack installer, what would you do when something went wrong at 3am on a Saturday?

Perhaps a pony then?

Yes a pony would be nice.

Toys for learning

Being able to quickly setup a working OpenStack environment to evaluate if it is worth looking into more deeply, is an entirely reasonable goal. As is getting a running cloud so you can learn how it behaves through observing and deconstructing it. Both of these are a long way from production and can be expected to be done with little to no OpenStack experience.

I used FAI, which we use for PXE bast boot strapping around here, and Puppetto do the actual OpenStack config bits for my first install (after an abortive attempt at using MaaS which conflicted with too many existing services on my network). Had a working multi node proof of concept system up in less than 16 person hours from deciding to hunt up abandoned hardware to having test users launching VMs. Granted this was during the Essex release so pre-Neutron, somewhat simpler times.

Today the official install guides would probably be both faster and better, since you can see the individual pieces fitting together as you assemble them rather than first taking a magical leap and then having to deconstruct what it did as it the case with Puppet (or your tool of choice most likely).

Both these methods are a bit more time investment than one would really want for the initial evaluation. DevStack will get something quickly with out too much investment, but as a developer tool it’s not quite an ideal operator evaluation tool. An introductory level unicorn would be a good creature to have, so long as people understand it’s mythical nature and don’t try to ride it all the way into production.

Making a plan

Even a pony can’t help you here.

Once you’ve come to understand the basics of how OpenStack is put together in the basic sense from the above {de,re}construction, it’s time to find the union of what you want with what you can have an hope that it includes what you need (it probably will).

The only way to do this is to do it. Iterate through configs either by hand or preferably through your config engine until you get what you want.

Only Docs can help you here. Now the docs team works amazing hours and does great work, but keeping up with the pace of OpenStack is near impossible. If you find something confusing, outdated, or just plain wrong there’s a little red bug icon on the top right of every page, click that and report it.

Ultimately in our perfectly documented world this iteration is still necessary if only to prove the behavior described is actually what you want.

Executing the plan

Once the hard design work is done you just feed that into your existing configuration management system like you’ve been doing for all you server configs for years right?

Pretty much yes. Different tool communities have different bumps in their particular processes, but if you pick the devil that you know you can pretty much anticipate what those are

So how is this OpenStack’s business.

Well in my opinion it’s not really.

In a broad sense reducing over all complexity to the extent we can and choosing reasonable defaults so fewer of the myriad config options need to be set are good goals that are in scope and would help simplify installs, but that’s really a side effect, simplifying on going management is the main need for those. You install infrequently (maybe once) but systems management is forever.

Packaging is your distro’s business which is as it should be, for example when have you ever tried to install X from upstream or install Linux From Scratch? These are instructive experiences but you probably wouldn’t base you infrastructure on do that. Of course if you are deploying from trunk then you don’t need packages, but you do need a CI/CD team and have more complex problems than any dreamed of here.

Configuration management is the union of the OpenStack and tool communities. You’ll see sessions for various tools at Summits, midcycle OPS meetups, and on StackForge, but they are not properly OpenStack projects.

You’re of course free to disagree, and build a native magical installer if you want. As I said choice is good. But if you’re going to hate on how hard OpenStack is to install try and remember what it was like to install a GNU/Linux operating system when it was four years old (say 1997 when the Debian disto was 4). OpenStack is a dream compared to that, so quit whining and get to work.

October 12, 2014 03:30 PM

October 11, 2014

Florent Flament

OpenStack Swift Ring made understandable

When people talk about OpenStack Swift, we often hear the word Ring. This is because the Ring is a central piece in how Swift is working. But what is this thing everyone's talking about ?

The Ring refers to 3 files that are shared among every Swift nodes (storage and proxy nodes):

  • object.ring.gz
  • container.ring.gz
  • account.ring.gz

There is actually one ring per type of data manipulated by Swift: Objects, Containers and Accounts. These files determine on which physical devices (hard disks) will be stored each object (and also each container and account). The number of devices on which an object is stored depends on the number of replicas (copies) specified for the Swift cluster.

How does it concretely work

When receiving an object to store, Swift computes a (MD5) hash of the object's full name (including its account's and container's name). A part of this hash is kept and interpreted by Swift as the partition number. The length of the hash segment kept depends on the number of partitions that has been set in the Swift cluster; This number is necessarily a power of 2. So that if we keep n bits of the hash, we have 2^n partitions.

The object ring is a map that associates each partition to a specific physical device. This mechanism is then repeated for every object's replicas, and also for containers and accounts.

To be more specific, the object's ring has 3 components:

  • What is referred in the code as the _replica2part2dev table (which name is relatively explicit as we'll see later on)
  • The table of devices describing each device
  • The length of an object's hash to consider as the partition number

The _replica2part2dev structure is a 2-dimensional table, so that for any (replica number, partition number) couple, the table indicates the physical device, where the object should be stored.

The devices table contains every information that a Swift node needs in order to reach a given device; It consists mostly in the device's storage node's IP address, the TCP port to use, and the physical device name on the storage node.

In the end, the Ring is composed of 2 tables and one integer. If I were to choose a name for such structure, I would call it the Table. I couldn't find any explanation of why the name Ring was adopted, but my guess is that some previous algorithm may have used some modular computation, which people tend to represent using rings..

Example

Here is a simple example to make everything clear. Let's consider a Swift cluster with 2 storage nodes, with the following IPs addresses: 192.168.0.10 and 192.168.0.11. Each storage node has two devices: sdb1 and sdc1.

An example of _replica2part2dev table with 3 replicas, 8 partitions and 4 devices would be:

r
e  |   +-----------------+
p  | 0 | 0 1 2 3 0 1 2 3 |
l  | 1 | 1 2 3 0 1 2 3 0 |
i  | 2 | 2 3 0 1 2 3 0 1 |
c  v   +-----------------+
a        0 1 2 3 4 5 6 7
       ------------------>
           partition

The table has 3 lines, one for each replica, and 8 columns, one for each partition. To find the device storing the replica number 1 of partition number 2, we select the line of index 1 and column of index 2. This lead us to the device ID 3.

The devices table is very similar to what we can obtain by using the swift-ring-builder with only the builder file as argument:

$ swift-ring-builder mybuilder 
mybuilder, build version 4
8 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 0
Devices:    id  region  zone      ip address  port  replication ip  replication port      name weight partitions balance meta
             0       1     1    192.168.0.10  6000    192.168.0.10              6000      sdb1 100.00          6    0.00 
             1       1     1    192.168.0.10  6000    192.168.0.10              6000      sdc1 100.00          6    0.00 
             2       1     1    192.168.0.11  6000    192.168.0.11              6000      sdb1 100.00          6    0.00 
             3       1     1    192.168.0.11  6000    192.168.0.11              6000      sdc1 100.00          6    0.00

The device of ID 3 can be found on server 192.168.0.11, port 6000, device name sdc1.

Simple is good

What I like with such mechanism is that the smartness of the data placement is performed by the swift-ring-builder, a standalone tool provided with Swift. Once the rings have been built, Swift processes running on the Swift nodes have a fully deterministic and easily predictable behavior.

The swift-ring-builder manipulates builders files; these are files containing architectural information about the Swift cluster (like distribution of devices and nodes among regions and zones). These builders are then used to generate the rings files. As with the rings, there is one builder per type of data (objects, containers and accounts).

Thanks to this mechanism the complexity of smartly storing objects has been well separated between:

  • Smartly assigning partitions (and corresponding objects, containers and accounts) to devices, taking into account the cluster's architecture. This is performed by the swift-ring-builder

  • Ensuring that files are stored uncorrupted at the appropriate locations; This is performer by the processes running on the Swift nodes.

More

For more information about the ring, one can read the Swift's developer documentation about the Ring.

by Florent Flament at October 11, 2014 08:00 PM

October 10, 2014

IBM OpenStack Team

What’s new in IBM Cloud Orchestrator 2.4?

IBM Cloud Orchestrator 2.4 has just been released. It is the follow-up to IBM SmartCloud Orchestrator 2.3. This new version takes some key steps toward stronger integration with OpenStack:

• You can use OpenStack Heat templates and deploy Heat stacks by way of the self-service interface.

• There is a brand new administrative user interface based on OpenStack Horizon that allows you to easily manage your infrastructure.

• The homegrown support for the VMware hypervisor that was included in SmartCloud Orchestartor 2.3 has been replaced with the OpenStack VMware driver.

• It is possible to exploit OpenStack Neutron networks.

IBM Cloud Orchestrator administrative user interface
user interface

The image management logic has been simplified, eliminating the need for the Virtual Image Library and Image Construction and Composition Tool. You can now build images that are compatible with IBM Cloud Orchestrator by utilizing cloud-init/cloudbase-init, and you can deploy single instances without having to build a virtual system pattern or virtual application pattern. This simplifies the architecture and can improve serviceability and maintainability.

On the orchestration side, a few enhancements have been made:

• You can delegate roles and access to resources to the cloud administrator, domain administrator, service designer and user.

• You can start, stop or delete images created outside of the IBM Cloud Orchestrator environment, map them to OpenStack projects and run orchestration actions on them.

• If you want to reach out to public cloud resources, you can deploy instances and orchestrate resources not only on Amazon EC2, but also on IBM SoftLayer.

• The set of content packs is ready for immediate use and has been enriched, covering the most typical automation scenarios related to infrastructure as a service (IaaS) and platform as a service (PaaS) scenarios.

The ready-to-use self-service catalog
catalog

Additionally, the self-service user interface has a new look and feel, including a brand new dashboard that helps you understand the overall health status of your cloud.

The IBM Cloud Orchestrator dashboard
dashboard (1)

You can read more about IBM Cloud Orchestrator features and capabilities here.

If you’d like to discuss things further and learn more about the new release, leave a comment below or get in touch with me on Twitter @DeGaRoss.

The post What’s new in IBM Cloud Orchestrator 2.4? appeared first on Thoughts on Cloud.

by Rossella De Gaetano at October 10, 2014 08:45 PM

Solinea

Making the Case for OpenStack—You’d Be Surprised What Enterprises Overlook

buildings-48796This is the first of a two part series that highlights what executives need to consider when investing in an OpenStack cloud infrastructure. It turns out that the biggest obstacles to the successful implementation of an OpenStack cloud have nothing to do with technology…  



by Francesco Paola (fpaola@solinea.com) at October 10, 2014 05:00 PM

Kyle Mestery

What’s New in Neutron for OpenStack Juno

As of today, we just published the second Juno release candidate for Neutron. The expectation is this will be the final RC candidate and will become the official 2014.2 release of OpenStack Neutron. I thought I would take a moment to highlight some of the awesome work done by our community during the past 6 months.

Distributed Virtual Router

By far one of the largest, if not the largest, features we added as a team was the addition of Distributed Virtual Router (DVR) functionality. The team working on this has spent multiple cycles iterating on this code, and it finally landed in Juno. This is an exciting development because in prior versions of Neutron, the default L3 routing behavior was to send all traffic to L3 network nodes for routers. Clearly this presents issues around single points of failure on those L3 network nodes, not to mention the issue around having those L3 nodes become traffic bottlenecks. With DVR, routing functionality is distributed to each compute node, removing the need for a central L3 network node for this functionality. NAT functionality is also distributed in a similar manner. SNAT, however, is still centralized. The reason for this is due to the requirement of burning an IPV4 address on each compute node in a distributed SNAT environment. The wiki listed above has many more details on the DVR architecture.

IPV6

The Juno release of Neutron will close the gaps on IPV6 support for tenant networks, allowing full IPV6 for tenant networks. We’ve added the capability for Neutron to manage the RADVD daemon to serve IPV6 RAs when required. For those looking to deploy Neutron in a pure IPV6 environment, you’ll find Juno allows for such deployments.

Security Group Enhancements

There are some well known issues around security group scaling with previous versions of OpenStack Neutron. In Juno, we’ve addressed these issues with two very important blueprints: The addition of ipset in lieu of iptables to manage security group rules on compute nodes, and the refactoring of the security_group_rules_for_devices RPC call. Both of these additions are meant to scale and dramatically improve the performance of the security groups implementations of Neutron.

L3 Agent Improvements

We made some serious performance improvements in the L3 agent during Juno, as well as added the capability to have HA for the L3 agent. Both of these will help with users deploying the Neutron L3 agent. The HA work in particular means you can now have redundancy for L3 agents. As I mentioned earlier, the L3 agent is still required for SNAT traffic for DVR. So this improvement means there is no longer a single point of failure for that traffic when using the DVR solution. For deployers who choose not to use DVR, now you can have HA for all your L3 routing and NAT traffic as well.

Summary

I’ve just highlighted some of the many improvements we’ve made in Juno for Neutron. For information on more features, including new plugins and drivers, checkout the release notes. We’re excited by the work the Neutron community has done for Juno, and we’re looking forward to an exciting development cycle for Kilo as well!

by mestery at October 10, 2014 03:43 PM

OpenStack Blog

OpenStack Community Weekly Newsletter (Oct 3 – 10)

Mentoring others and yourself

While this topic would have been good for the Tips-and-tricks section below, I think it deserves to open this week’s wrap-up. With a new session of Outreach Program for Women about to start for OpenStack, our good mentor Flavio Percoco gives some ideas on being a good mentor.

OpenStack Technical Committee Update

The last meeting of the current Technical Committee before the elections (which started today). Vishvananda Ishaya wraps up the conversations around graduation, the contributor license agreement and the “big tent”.

Next steps for ‘Hidden Influencers’

With Paris only weeks away it’s time to announce that we have a time and place to meet people whose job is to decide what OpenStack means for their company. The OpenStack Foundation has offered a room to meet in Paris on Monday, November 3rd, in the afternoon: please add the meeting to your schedule and join the mailing list.

The Road To Paris 2014 – Deadlines and Resources

During the Paris Summit there will be a working session for the Women of OpenStack to frame up more defined goals and line out a blueprint for the group moving forward. We encourage all women in the community to complete this very short survey to provide input for the group.

Relevant Conversations

Tips ‘n Tricks

Security Advisories and Notices

Upcoming Events

Other News

Got Answers?

Ask OpenStack is the go-to destination for OpenStack users. Interesting questions waiting for answers:

Welcome New Reviewers, Developers and Core Reviewers

Mudassir Latif Daniel Mellado
Anna Joakim Löfgren
woody Shaifali Agrawal
Barnaby Court Oleksii Zamiatin

OpenStack Reactions

success

Just finished a successful and very productive mid-cycle hackathon

The weekly newsletter is a way for the community to learn about all the various activities occurring on a weekly basis. If you would like to add content to a weekly update or have an idea about this newsletter, please leave a comment.

by Stefano Maffulli at October 10, 2014 03:40 PM

Opensource.com

What it takes to make a cloud deployment successful

Mark Voelker is no stranger to the OpenStack community. As a technical leader at Cisco and a co-founder of the Triangle OpenStack Meetup, Mark gets to see OpenStack from a lot of different lenses.

by Jason Baker at October 10, 2014 09:00 AM

October 09, 2014

Flavio Percoco

Mentoring others and yourself

Mentoring is one of the things I enjoy the most doing. I don't consider myself the ultimate expert on things but I've definitely gone through enough things that had led me to become a mentor on different technical areas.

The reason I enjoy this process so much is because it allows me to relate with other great people, cultures and minds. Every interaction is full of joy, gratitude, knowledge and collaboration. It helps me understand how to interact with different people, how to work together with different cultures and it allows me to integrate more with the environment I live in. Everyday that goes by, either by being part of a program or in my daily actions, I try to give back to people more than what I've received in my life.

I believe, like me, many other people feel the same and/or are interested in this topic. Therefore, I'm writing this post with some of the experiences I've lived and the things I've learned so far by mentoring others.

Mentoring is more than just teaching

When you agree to be someone's mentor, you're agreeing to be this person's guide during a finite period of time. You're agreeing to do more than just teaching things. You're agreeing to live by example, to lead your mentees to whatever knowledge they are seeking, to trust your mentees' passion as they trust yours.

You cannot teach people to trust themselves if you don't trust yourself. You cannot help someone to become more secure if you are not secure enough yourself. Nonetheless, you also need to be careful on how much you think you know and how much you think your meentee doesn't know. One of the worst mistakes you can make as a mentor is to underestimate your mentee's knowledge. The more you think you know about this process, the less likely you'll succeed. You need to be alert, you need to listen as if you didn't know what to do. Everyday new day as a mentor brings new experiences that will make you think differently, embrace them.

Nevertheless, in order to succeed as a mentor, you need to first understand what your tasks are. It is extremely important for you to revisit these goals everyday and make sure you stand by them. Here's a list of questions which, although not exhaustive, I think is a good place to begin with:

  • What is it that you want to share?
  • How much time are you willing to dedicate?
  • How much time are you actually going to be able to dedicate?
  • How much are you willing to stop and listen to what your mentee has to say?
  • Are you ready to become part for someone's growth and let that person become part of yours?

Although the above questions may sound a bit philosophical, I believe they build the ground for a good start as a mentor. Understanding that you're not the only one who's going to teach something is mandatory and you, as a mentor, need to keep your mind open to this. These questions are valid regardless of the project you're mentoring on, they are the bases of what you will - or won't - do as a mentor.

Don't give your mentee fishes

This, obviously, comes from the proverb:

Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime

When there's lack of trust on one's knowledge, it is very likely that we'll wait and hope for someone to give us what we're looking for. As a mentor, depending on the time you've available, your patience, your dogs' urges to go for a walk, you'll be tempted to give the answer to get this over with. As the aforementioned proverb implies, this is not good at all and it'll work just once, so don't.

Giving the answer won't help neither you nor your mentee to grow. Your mentee won't learn how to find the answers he/she needs and you won't know how to teach your mentee to get to the answer without giving it up. Be smart, work on different ways that will lead your mentee to the answer. Teach your mentee how to ask the right questions and reply with more constructive questions that will "light the bulb" in your mentee's brain.

Avoid conversations like:

mentee: I think X is what I was looking for
mentor: Not exactly but you're almost there

Instead, look for more constructive answers that will bug your mentee's mind:

mentee: I think X is what I was looking for
mentor: If X is what you're looking for, could Y be sufficient as well? Have you thought about how X and Y may be related?

Teach your mentee how to switch perspective, how to think out of the box. Perspective is one of the most important things in emotional intelligence and you must lead your mentees towards a complete view of the problem in a way that they'll also learn the importance of this process for problem solving.

As hard as it can be, you need to be patient. Your mentees are smart, probably smarter than you, but they may not know it yet. Make sure your mentees know how smart they are at the end of the journey. Give your mentees the required tools to answer the questions they may find, because the more tools you teach them to use, the more independent they'll be at the end.

Agreeing feels nice, disagreeing feels better

One thing that many mentees find very difficult to do is to disagree with other people. There are several reasons for that to happen. Mentees may feel they don't know enough or that others with more experience are always right, which is far far far from being true. Mentees need to be told that disagreeing is good and good discussions are, most of the times, the source of great ideas and epiphanies.

Teaching mentees how to trust themselves is not enough to help them understand that disagreeing is usually better than agreeing. They also need to be taught how to disagree. It's not enough to say "I disagree". When you're teaching someone to disagree, you must let that person know that disagreeing is the first step towards an argument and that there's nothing bad about arguing.

I often challenge mentees when they agree with me on something. I'd ask them the reasons why they think I'm right and force them to think through possible different scenarios and whether what I'm saying is actually correct or not. It's really important to encourage mentees to think things through. The more they do this exercise, the easier it'll be for them to spot things in the future. Collaboration is not just about doing things together but about reviewing each others' work as well. I trust the team I work with on reviewing my work and providing feedback if something doesn't feel right. I want people to disagree with me and challenge me to think things through. I believe this is an important thing to learn that many communities with long time leaders lack of. Therefore, I believe this is a key thing to teach to your mentees.

Disagreeing is not just about discussing on someone's argument. It's also about communicating, humbly, one's opinions on that matter. I believe that what actually defines a community is not the group of people willing to work on a common goal but the ability of those people to communicate and discuss things that will then lead the work towards that goal. Two people can work on the same thing without even talking - it'd be hard, yes, but it's still possible. It's our duty, as mentors, to encourage mentees to communicate with the rest of the community regardless of its size and how scary it could be.

Co-mentoring is even better

Remember what we said about perspective? Perspective is one of the things that matters the most in our daily tasks. Depending on our perspective, things could go in many different ways. It's been said thousands of times that thinking out of the box is important and that changing perspective allows for a better and stronger personal growth.

If you enjoy mentoring other people like I do, you probably want to do it often and more importantly you always want to be there. Despite this being a great thing, you need to understand that mentees are not yours and that other people, with similar passions like yours and different perspective, may also help the mentee to grow. People with different experiences see the world in different ways and you want your mentee to know that. You want your mentee to learn from different people, you want your mentee to think out of the box and to do that, your mentee needs to change perspective.

I often encourage mentees to work on features with other experienced folks in the community that could guide them through different roads. This helps them to improve their communication skills, to improve their ability to work with distributed teams that may not be in the same TZ and it also helps them to learn from folks that may have different points of views.

Keep an eye on your mentee

Mentees need to be left alone as much as they need to be guided. Excessive hand-holding will hurt your mentee as much as excessive independence. Note that I'm not saying mentees shouldn't be proactive or do things by themselves. What I'm saying is that you need to be ready to provide guidance when it's required and you cannot know when your mentees need guidance if you're not keeping an eye on the things they are doing.

There's a huge difference between controlling mentees and keeping an eye on them. By keeping an eye on them you're just making sure you'll be ready to act when needed and you wont wait until it's too late to do so. You want your mentees to be proactive, to seek the answers to their questions, to experiment with different technologies and ideas. You just need to be there, just be there.

As part of the guidance I like to provide, I'm always looking forward to my mentees' patches. I encourage them to publish their work even if it's not ready yet. That helps them to share ideas that are still on the works and it helps me to see what path they're going down through. If there are things I don't agree with, I always try to understand why they think those things are a good idea and then provide any guidance needed if I still disagree.

Jerks are deprecated but still around

If you've read my 'Jerks are deprecated' post, I'm pretty sure you expected me to say something along those lines in this one. Despite my big wish to have jerks treated with some magical 'be-nice' cure, they are still around and you need to make sure your mentee knows how to interact with them as well.

For some people, one of the hardest things to do when it comes to become part of a community, team or work environment is to speak up. Unfortunately, jerks have no mercy. It doesn't matter whether you're a newbie, a young person making your way through this world or a very experienced person. For people afraid of speaking up, Jerks are probably the worst thing they have to face.

I believe jerks are like Trolls. The more you're 'afraid' of them the jerker they'll be. The more you feed them, the more they'll chase you. Therefore, the best way to interact jerks is not necessarily ignoring them - although that works - but treating them nicely.

Teach your mentee how to reply to jerks by keeping their reply in context, nice and direct, completely ignoring the fact that they're talking to a jerk. That's not going to change the fact that this person is a jerk but it'll help the mentee to understand that being nice is free and by replying nicely they'll feel good and still get their job done.

One thing that I also think is a great exercise is to talk at conferences. You'll never know who'll attend your talk and standing up in front of a diverse audience is not an easy task. By encouraging your mentees to talk at conferences, you're encouraging them to trust themselves and also be nice about it. You can guide your mentees through what talking in front of people means, you can give them some tips and tricks and you can also walk them through ways for sharing their knowledge with a nice and trustworthy attitude. Make sure they understand that many questions could be asked by many different people. There are nice ways to reply or even avoid jerk's questions without feeding the jerk.

Thanking is for free

This is a quote from a conversation between myself and an OPW applicant. I've nothing else to add here:

> flaper87 exploreshaifali: please, do keep digging. The more questions you ask, the clearer it will be for you :)
> exploreshaifali flaper87, sure :)
> exploreshaifali Thanks!!
> flaper87 exploreshaifali: thank *you* ;)
> exploreshaifali flaper87, thanking me for what?
> flaper87 exploreshaifali: for your time, interest and perseverance. I appreciate you willing to work on this and I thank you for that. I look forward to see you around long enough to do way more.

Come to the bright side

By mentoring someone, you're improving yourself as well. You'll learn how to interact with different people from different cultures. You'll understand that emotions matter more than people think and that people's past, present and future are important for this learning process. You'll learn that you can't teach things you don't believe yourself - these contradictions will come up and you'll have to admit failure. Furthermore, you'll learn that you're your mentee's mentee and that this learning process goes both ways. I've learned as much from my mentees as they have, I hope, learned from me.

Really, mentoring is one of the things I like the most. I enjoy good conversations, laughs and sharing. It's a very thankful - yes, thankful - job to do and it's also emotionally rewarding. If you haven't done it before, I encourage you to do so. Even if you may not think so, I'm sure you've many things to share. Just make sure you understand how critical this work is and how much responsibility you'll have in your hands.

Last but not least, I'd like to make sure you understand you don't need to be in an 'official' program to mentor people. By living as an example and helping others, you're already doing so. Nonetheless, volunteering for mentoring other people is both needed and nice. Don't hesitate to do so.

by FlaPer87 at October 09, 2014 10:04 PM

OpenStack Blog

Futurecom 2014

Next week, the OpenStack Foundation is sponsoring the Congress and Business Trade Show at Futurecom 2014 in São Paolo, Brazil. Dualtec Cloud Builders, based in São Paolo, will be co-sponsoring the event with the Foundation.

OpenStack has been gaining popularity worldwide, and we are reaching potential users and community members via industry and community events. Futurecom will be the first industry event that the OpenStack Foundation sponsors in South America.  Last month, Dualtec hosted an OpenStack Meetup in São Paolo with an impressive turnout of more than 500 IT Managers. The event speakers broadcasted the OpenStack concept and its benefits, including more than ten hours of presentations, hands-on workshops and case studies from two predominant Brazilian companies using OpenStack: Serpro and UOL.

Dualtec Meetup

Heidi Bretz, the Director of Business Development for the Foundation, will be attending Futurecom and moderating a panel with several OpenStack users on Tuesday, October 14 2:40 – 3:10pm in Room São Paolo. Panelists are:

  • Renato Serra Armani, Innovation Manager, Dualtec Cloud Builders
  • Roberval Aratame Ribeiro, Gerente Geral de Projetos – P&D – Cloud Services, UOL
  • Max Tkach, Technical Leader, Cloud Services, MercadoLibre

Marcelo Dieder, the OpenStack Ambassador in Brazil, will also be introduced.  All speakers, including Heidi Bretz from the OpenStack Foundation, will be available to chat after the session and in the OpenStack booth in the Tradeshow Hall B, Booth C22, from Monday evening through Thursday.

 

Screen Shot 2014-10-09 at 4.04.42 PM

 

The OpenStack Foundation is expanding it’s strong history of supporting international events with the Futurecom sponsorship. The Foundation has also supported DEVIEW in South Korea, Cloud Connect China in Shanghai, CloudOpen in Düsseldorf and regional events in Taiwan, Tokyo, Tel Aviv, London, Paris, Berlin and Budapest. In the first half of 2015, OpenStack will sponsor linux.conf.au in Auckland in January and Cloud Expo Europe in London in April.

In addition to industry and regional events, the OpenStack Summit first went international in November 2013 in Hong Kong and again this November in Paris. Learn more or register for the upcoming OpenStack Paris Summit here.

by Allison Price at October 09, 2014 08:10 PM

Cloud Platform @ Symantec

Testing OpenStack Upgrades with a Separate Control Plane Using DevStack

Before upgrading OpenStack in your production environment, you may want to try an upgrade on a simple DevStack installation to flush out any issues beforehand.  For our next OpenStack upgrade from Havana to Icehouse in production, we will be setting up a separate control plane (with all OpenStack controller services) with the new OpenStack version, upgrading the existing compute nodes in place, and then switching the existing compute nodes to use the new control plane.  In comparison to running an in place upgrade of the control plane, switching to a new control plane has these advantages:

 

  •  Shorter downtime of the control plane from a user perspective
  •  Shorter rollback time if issues occur during the upgrade
  •  Less risk of rollback not working if needed

 

In our upgrade, we will also upgrade the database in place using the db sync commands.  This will be necessary to sync the database tables to the new version in preparation for the new control plane to use the existing database.

 

Most of the content I've seen previously on the web about OpenStack upgrades addresses the case of upgrading all services in place, but there isn't much out there about switching to a separate control plane.

 

To test switching to the new control plane with DevStack installs, these will be the high level steps:

 

  • Shut down services
  • Take a database backup
  • Migrate database
  • Bring up new controller services pointed to old database

 

Prerequisites

 

Start with working DevStack installs in separate VM's.  One DevStack install should be running and tested with the old version of OpenStack, and another DevStack install should be running and tested with the new version of OpenStack.  To use an older version than the DevStack master branch, git clone DevStack and then switch to the older branch before running stack.sh.  The two VM's will need to be able to access each other on the network.  I use VMWare Fusion to run 2 VM's on my local system, but there are other software packages that can be used for virtualization.

 

Shut Down Services

 

Shut down the controller and compute services on the two DevStack installs to prevent new data from being written to the database.  This can be done by stopping all services in the DevStack screen sessions.

 

Take a Database Backup

 

Even though this is just a test with DevStack, it's a good idea to get in the habit of taking backups before modifying data.  On the DevStack install running the old OpenStack version, run this command, which is a variation on the backup command from the OpenStack upgrade docs:

 

# mysqldump -u root -p --opt --add-drop-database --all-databases > openstack-db-backup.sql

 

The database on the DevStack install running the old OpenStack version is analogous to your existing production database, which the new control plane will end up using by the end of the upgrade.  There's no need to take a database backup on the DevStack install running the new OpenStack version, as we won't be modifying that database.

 

Migrate Database

 

It's next necessary to migrate the old database to the schema of the new OpenStack version, as the new version of the code may depend on changes to tables and columns.  The easiest way to do this while also preparing for the new version of the code to use the existing database is to point the services on the DevStack install with the new OpenStack version to the database on the other DevStack install.  You'll need to modify the configurations for each service you plan to test.  For the nova service, modify nova.conf:

 

Change

sql_connection = mysql://root:[my sql password]@127.0.0.1/nova?charset=utf8

 

To

sql_connection = mysql://root:[my sql password]@[IP address of old DevStack]/nova?charset=utf8

 

Now that the new version of the code is pointing to the existing database, run the db sync:

 

$ nova-manage db sync

 

By default, the nova-manage command will use the sql_connection value in nova.conf for the location of the database to sync.

 

Repeat this step for each of the conf files for the other services you're interested in upgrading.

 

Bring up new controller services pointed to old database

 

In the previous step, the controller services on the new control plane were pointed to the old database.  The old database has already synced to work with the new version of the code.  Next, bring up the services on the new control plane.

 

Limitations When Testing this Method DevStack

- As DevStack uses the local file system for the Glance image repository, the Glance images need to be synced over to the VM with the new OpenStack version.

- Note that in this case, we didn't do an in place upgrade of the compute nodes.  In production, you may want to do an in place upgrade of the compute nodes to reduce complexity of the entire upgrade.

 

After syncing the Glance images, you'll now be able to start up a VM using the new control plane that uses the migrated database!

by brad_pokorny at October 09, 2014 06:16 PM

Amar Kapadia

4 Issues that Ail Swift

While I'm a Swift enthusiast, I also realize it hasn't taken over the world. Other object storage systems (open-source e.g. CEPH, and proprietary e.g. Scality, Cleversafe, Amplidata) are finding success in various use-cases. In fact, industry analyst Marc Staimer is outright down on Swift. While his scoring is overly harsh on Swift, he does make a number of great points.

So what ails Swift? Or in other words, what issues need to be fixed to make Swift a clear #1? I can think of four key issues.

Read more »

by Amar Kapadia (noreply@blogger.com) at October 09, 2014 03:26 PM