Category Archives: devops

Docker-based FIO I/O benchmarking

687474703a2f2f692e696d6775722e636f6d2f336f46443358502e706e67

What is FIO?

fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. The typical use of fio is to write a job file matching the I/O load one wants to simulate. – (https://linux.die.net/man/1/fio)

fio can be a great tool for helping to measure workload I/O of a specific application workload on a particular device or file. Fio proves to be a detailed benchmarking tool used for workloads today with many options. I personally came across the tool while working at EMC when needing to benchmark Disk I/O of application running in different Linux container runtimes. This leads me to my next topic.

Why Docker based fio-tools

One of the projects I was working on was using Docker on AWS and various private cloud deployments and we wanted to see how workloads performed on these different cloud environments inside Docker container with various CPU, Memory, Disk I/O limits with various block, flash, or DAS based storage devices.

One way to wanted to do this was to containerize fio and allow users to pass the workload configuration and disk to the container that was doing the testing.

The first part of this was to containerize fio with the option to pass in JOB files by pathname or by a URL such as a raw Github Gist.

The Dockerfile (below) is based on Ubuntu 14 which admittedly can be smaller but we can easily install fio and pass a CMD script called run.sh.

FROM ubuntu:14.10
MAINTAINER <Ryan Wallner ryan.wallner@clusterhq.com>

RUN sed -i -e 's/archive.ubuntu.com/old-releases.ubuntu.com/g' /etc/apt/sources.list
RUN apt-get -y update && apt-get -y install fio wget

VOLUME /tmp/fio-data
ADD run.sh /opt/run.sh
RUN chmod +x /opt/run.sh
WORKDIR /tmp/fio-data
CMD ["/opt/run.sh"]

What does run.sh do? This script does a few things, is checked that you are passing a JOBFILE name (fio job) which without REMOTEFILES will expect it to exist in `/tmp/fio-data` it also cleans up the fio-data directory by copying the contents which may be jobs files out and then back in while removing any old graphs or output. If the user passes in REMOTEFILES it will be downloaded from the internet with wget before being used.

#!/bin/bash

[ -z "$JOBFILES" ] && echo "Need to set JOBFILES" && exit 1;
echo "Running $JOBFILES"

# We really want no old data in here except the fio script
mv /tmp/fio-data/*.fio /tmp/
rm -rf /tmp/fio-data/*
mv /tmp/*fio /tmp/fio-data/

if [ ! -z "$REMOTEFILES" ]; then
 # We really want no old data in here
 rm -rf /tmp/fio-data/*
 IFS=' '
 echo "Gathering remote files..."
 for file in $REMOTEFILES; do
   wget --directory-prefix=/tmp/fio-data/ "$file"
 done 
fi

fio $JOBFILES

There are two other Dockerfiles that are aimed at doing two other operations. 1. Producing graphs of the output data with fio2gnuplot and serving the graphs and output from a python SimpleHTTPServer on port 8000.

All Dockerfiles and examples can be found here (https://github.com/wallnerryan/fio-tools) and it also includes an All-In-One image that will run the job, generate the graphs and serve them all in one which is called fiotools-aio.

How to use it

Build the images or use the public images
Create a Fio Jobfile
Run the fio-tool image

docker run -v /tmp/fio-data:/tmp/fio-data \
-e JOBFILES= \
wallnerryan/fio-tool

If your file is a remote raw text file, you can use REMOTEFILES

docker run -v /tmp/fio-data:/tmp/fio-data \
-e REMOTEFILES="http://url.com/.fio" \
-e JOBFILES= wallnerryan/fio-tool

Run the fio-genplots script

docker run -v /tmp/fio-data:/tmp/fio-data wallnerryan/fio-genplots \
<fio2gnuplot options>

Serve your Graph Images and Log Files

docker run -p 8000:8000 -d -v /tmp/fio-data:/tmp/fio-data \
wallnerryan/fio-plotserve

Easiest Way, run the “all in one” image. (Will auto produce IOPS and BW graphs and serve them)

docker run -p 8000:8000 -v /tmp/fio-data \
-e REMOTEFILES="http://url.com/.fio" \
-e JOBFILES=<your-fio-jobfile> \
-e PLOTNAME=MyTest \
-d --name MyFioTest wallnerryan/fiotools-aio

Other Examples

Important

Your fio job file should reference a mount or disk that you would like to run the job file against. In the job fil it will look something like: directory=/my/mounted/volume to test against docker volumes
If you want to run more than one all-in-one job, just use -v /tmp/fio-data instead of -v /tmp/fio-data:/tmp/fio-data This is only needed when you run the individual tool images separately

To use with docker and docker volumes

docker run \
-e REMOTEFILES="https://gist.githubusercontent.com/wallnerryan/fd0146ee3122278d7b5f/raw/cdd8de476abbecb5fb5c56239ab9b6eb3cec3ed5/job.fio" \
-v /tmp/fio-data:/tmp/fio-data \
--volume-driver flocker \
-v myvol1:/myvol \
-e JOBFILES=job.fio wallnerryan/fio-tool

To produce graphs, run the fio-genplots container with -t <name of your graph> -p <pattern of your log files>

Produce Bandwidth Graphs

docker run -v /tmp/fio-data:/tmp/fio-data wallnerryan/fio-genplots \
-t My16kAWSRandomReadTest -b -g -p *_bw*

Produce IOPS graphs

docker run -v /tmp/fio-data:/tmp/fio-data wallnerryan/fio-genplots \
-t My16kAWSRandomReadTest -i -g -p *_iops*

Simply serve them on port 8000

docker run -p 8000:8000 -d \
-v /tmp/fio-data:/tmp/fio-data \
wallnerryan/fio-plotserve

To use the all-in-one image

docker run \
-p 8000:8000 \
-v /tmp/fio-data \
-e REMOTEFILES="https://gist.githubusercontent.com/wallnerryan/fd0146ee3122278d7b5f/raw/006ff707bc1a4aae570b33f4f4cd7729f7d88f43/job.fio" \
-e JOBFILES=job.fio \
-e PLOTNAME=MyTest \
—volume-driver flocker \
-v myvol1:/myvol \
-d \
—name MyTest wallnerryan/fiotools-aio

To use with docker-machine/boot2docker/DockerForMac

You can use a remote fit configuration file using the REMOTEFILES env variable.

docker run \
-e REMOTEFILES="https://gist.githubusercontent.com/wallnerryan/fd0146ee3122278d7b5f/raw/d089b6321746fe2928ce3f89fe64b437d1f669df/job.fio" \
-e JOBFILES=job.fio \
-v /Users/wallnerryan/Desktop/fio:/tmp/fio-data \
wallnerryan/fio-tool

(or) If you have a directory that already has them in it. *NOTE*: you must be using a shared folder such as Docker > Preferences > File Sharing.

docker run -v /Users/wallnerryan/Desktop/fio:/tmp/fio-data \
-e JOBFILES=job.fio wallnerryan/fio-tool

To produce graphs, run the genplots container, -p

docker run \
-v /Users/wallnerryan/Desktop/fio:/tmp/fio-data wallnerryan/fio-genplots \
-t My16kAWSRandomReadTest -b -g -p *_bw*

Simply serve them on port 8000

docker run -v /Users/wallnerryan/Desktop/fio:/tmp/fio-data \
-d -p 8000:8000 wallnerryan/fio-plotserve

Notes

The fio-tools container will clean up the /tmp/fio-data volume by default when you re-run it.
If you want to save any data, copy this data out or save the files locally.

How to get graphs

When you serve on port 8000, you will have a list of all logs created and plots created, click on the .png files to see graph (see below for example screen)

687474703a2f2f692e696d6775722e636f6d2f6e6b73516b5a692e706e67

Testing and building with codefresh

As a side note, I recently added this repository to build on Codefresh. Right now, it builds the fiotools-aio Dockerfile which I find most useful and moves on but it was an easy experience that I wanted to add to the end of this post.

Navigate to https://g.codefresh.io/repositories? or create a free account by logging into codefresh with your Github account. By logging in with Github it will have access to your repositories you gave access to and this is where the fio-tools images are.

I added the repository as a build and configured it like so.

screen-shot-2016-12-29-at-2-45-46-pm

This will automatically build my Dockerfile and run any integration tests and unit tests I may have configured in codefresh, thought right now I have none but will soon add some simple job to run against a file as an integration test with a codefresh composition.

Conclusion

I found over my time using both native linux tools and docker-based or containerized tools that there is need for both sometimes and in fact when testing container-native application workloads sometimes it is best to get metrics or benchmarks from the point of view of the application which is why we chose to run fio as a microservice itself.

Hopefully this was an enjoyable read and thanks for stopping by!

Ryan

Service Oriented Architecture vs Modern Microservices: Whats the difference?

Service-Oriented Architecture:

Some of concepts of SOA that I’d like to mention (not fully encompassing):

Technologies widely used were SOAP, XML, WSDL, XSD and lots of Java
SOAs typically had a Service Bus or ESB (Enterprise Service Bus) a complex middleware aimed at providing access and masking of interfaces.
Identification and Inventory
Value chain and business model is more about changing the entire business process

Modern Microservces:

Technologies widely used are JSON, REST/HTTP and Polyglot services.
Communication is done over HTTP and the interfaces are abstracted using RESTful contracts.
Service Discovery
Value chain and business model is about efficiencies, small teams and DevOps practices while eliminating cilos.

The Bulkhead Analogy

I want to spend a little bit of time on one of the analogies that stuck with me about modern microservices. This was the Bulkhead analogy which I cannot for the life of me remember where I heard it or seem to google a successful author so credit to who or whom ever you are.

The bulkhead analogy is pretty simple actually but has a powerful statement for microservice design. The analogy is such that a MSA, like a large ship is made up of many containers (or in the ships case, bulkheads) that have boundaries between them and hold different component of the ships such as engines, cargo, pumps etc. In MSA, these containers hold different functions or processes that do something wether its handle auth requests, connection to a DB, service a lookup or transformation mechanism it doesn’t matter, just that in both cases you want all containers to be un-damaged for everything to be running the best it can.

The bulkhead analogy goes further to say that if a container gets damaged and takes on water then the entire ship should not sink due to one or few failures. In MSA this can be applied by saying that a few broken microservices should not be designed in a way where there failure would take down your entire application or business process. It essence designs the bulkheads or containers to take damage and remain afloat or “running”.

Again, this analogy is quite simple, but when designing your MSA it’s important to think about these details and is why doing things like proper RESTful design and Chaos testing is worth your time in the long run.

Similarities and Differences or the two architectures / architecture styles:

Given the little glimpse of information I’ve provided above about service oriented architectures and microservices architectures I want to spend a little time talking about the obvious similarities and differences.

Similarities

Both SOA and MSA do the following:

Code or service reuse
Loose coupling of services
Extensibility of the system as a whole
Well-defined, self-contained services or functions that overall help the business process or system
Services Registries/Catalogs to discover services

Differences

Some of the differences that stick out to me are:

Focus on business process, instead of the focus of many services making one important business process MSA focuses on allowing one thing (containerized process) to do one thing and do it well. This allows tighter context boundaries for microservices.
SOA tailors towards SOAP, XML, WSDL while MSA favors JSON, REST and Polyglot. This is one of the major differences to me, even though its just a tech difference this RESTful polyglot paradigm enables MSAs to thrive with todays developers.
The value chain and business model is more DevOps centric allowing the focus to be on loosely coupled teams that break down cilos and can focus on faster release cycles and CI/CD of their services rather than with SOA teams typically still had one monolithic view of the ESB and services without the DevOps focus.

Conclusion

Overall this post was mainly a complete high-level overview of what I think are some of the concepts and major differences between traditional SOA and Modern Microservices that stemmed from a course I took during my masters that explored SOA while I was in the industry working on Microservices. The main point I would say I have is that SOA and MSA are very similar but MSA being SOA’s offspring in a way using modern tooling and architecture approaches to todays scaleable data center.

Note* by no means did I cover SOA or MSA to do them any real justice, so I suggest looking into some of the topics talked about here or reading through some of the references below if your interested.

Cheers!

[1] Rosen, Michael “Applied SOA: Service-Oriented Design Architecture and Design Strategies” Wiley, Publishing Inc. 2008

[2] Gartner Research “Service Oriented” Architectures, Part 1:” – //www.gartner.com/doc/code/29201

[3] “SOA fundamentals in a nutshell” Aka Sniv February 2015 http://www.ibm.com/developerworks/webservices/tutorials/ws-soa-ibmcertified/ws-soa-ibmcertified.html

What Would Microservices do!?

1 Reply

Image Credit for the Googling of images

Private/Public IaaS, and PaaS environments are some of the fastest moving technology domains of their kind right now. I give credit to the speed of change and adoption to the communities that surround them, open-source or within the enterprise. As someone who is in the enterprise but contributes to open-source, it is surprising to many to find out that within the enterprise there is a whole separate community around these technologies that is thriving. That being said, I have been working with os-level virtualization technologies now for the past (almost) 3 years and in the past 2 years most people are familiar with the “Docker-boom” and now with the resurgence or SoA/Microservices I think its worth while exploring what modern technologies are involved, what drivers for change and how it affects applications and your business alike.

Containers and Microservices are compelling technologies and architectures, however, exposing the benefits and understanding where they come from is a harder subject to catch onto. Deciding to create a microservice architecture for your business application or understanding which contexts are bound to which functionality can seems like the complexity isn’t worth it in the long wrong. So here I explore some of the knowledge of microservices that is already out there in understanding when and why to turn to microservices and figuring out why, as in many cases, there really isn’t a need to.

In this post I will hopefully talk about some of major drivers and topics of microservices and how I see them in relationship to data-center technologies and applications. These are my own words and solely my opinion, however I hope this post can help those to understand this space a little better. I will talk briefly about, Conway’s Law, what it means to Break Down the Silos, why it is important to Continuous delivery, the importance of the unix philosophy, how to define a microservice, their relationship to SOA, the complexity involved in the architecture, what changes in the organization must happen, what companies and products are involved int this space, how to write a microservice, the importance of APIs and service discovery, and layers of persistence.

Microservices

The best definition in my opinion is “A Microservice fits in your head”. There are other definitions involving an amount of pizza, or a specific amount for lines of code, but I don’t like putting these boundaries on what a microservices is. In the simplest, a microservice is something that is small enough to conceptually fit in your head without really having to think to much about it. You can argue about how much someone can fit in their head etc, but then thats just rubbish and un-important to me.

I like to bring up the unix principle here as ice heard from folks at Joyent and other inn he field, this is the design that programs are designed to do one thing and do that one thing well. Like “ls” or “cat” for instance, typically if you design a microservice this way, you can limit its internal failure domain because it does one thing and exposes and API to do so. Now, microservices is a loaded term, and just like SoA there are similarities in these two architectures. But they are just that, architectures and I will add that you can find similarities in many of their parts but the some of the main differences is that SoA used XML, SOAP, typically a Single Message Bus for communication and a shared data source for services. Microservices uses more modern lightweight protocols like RESTful APIs, JSON, HTTP, RPC and typically a single microservice is attached to its own data source, whether is a copy, shard or a its own distributed database. This helps with multi-tenancy, flexibility and context boundaries that help scale such an architecture like microservices. One of the first things people start to realize when deep divining into microservices is the amount of complexity that comes out of slicing up the monolith because inherently you need to orchestrate, monitor, audit, and log many more processes, containers, services etc than you did with a typical monolithic application. The fact that these architectures are much more “elastic and ephemeral” than others forces technical changes that center around the smallest unit of business logic that helps deliver business value when combine with other services to deliver the end goal. This way each smaller unit can have its own change lifecycle, scale independently and be developed free of other dependencies within the typical monolith.

This drives the necessity to adopt a DevOps culture and change organizationally as each service should be developed by independent, smaller teams that can each release code within their own cycles. Teams still need to adhere to the invisible contracts that are between the services, these contracts are the APIs themselves between the services which talk to one another. I could spend an entire post on this topic but there is a great book called “Migrating to Cloud-Native Application Architectures” by Matt Stine of Pivotal (Which is free, download here) that talks about organizational changes, api-based collaboration, microservices and more. There is also a great post by Martin Fowler (here) that talks about microservices and the way Conways law affects the organization.

Importance of APIs

I want to briefly talk about the importance of orchestration, choreography and the important of the APIs that exist within a microservices architecture. A small note on choreography, this is another terms that may be new but its related to orchestration. Choreography is orchestration turned on its head, instead of an orchestration unit signaling when things happen, the intelligence is pushed to the endpoints and those endpoints react to events of changing environments, therefore each service known its own job. A great comparison of this is (here) in the book “Building Microservices” by Sam Newman. Rest APIs are at the heart of this communication, if an event is received from a customer of user, a choreography chain is then initialized and each endpoint talks to each other via these APIs, therefore, these APIs must remain robust, backward compatible and act as contracts between how services interact. A great post of the Netflix microservices work (here) explains this in a little more detail.

If the last few paragraphs and resources make some sense, you end up with a combination of loosely coupled services, strict boundaries, APIs (contracts), robust choreography and vital health and monitoring for all services deployed. These services can be scaled, monitored and moved independently without risk and react well to failures. Some of the exa plea of tools to hel you do this can be found at http://netflix.github.io. This all sounds great, but without taking the approach of “design for the integrations not the infrastructure/platform” (which I’ve heard a lot but can’t quite figure out who the quote belongs to, coudos to who you are :] ) this can fail pretty easily. There is a lot of detail I didnt cover in the above and I suggest looking into the sources I listed for a start on getting into the details of each part. For now I am going to turn to a few topics within microservices, Service Discovery & Registration and Data Persistence layers in the stack.

Service Discovery and Registration

Distributed systems at scale using microservices need a way to registry and discover what services and endpoints are available, enter Service Discovery tools like Consul, Etcd, Zookeepr, Eureka, and Doozerd. (others not listed) These tools make it easy for services to call this layer and find out a way to consume what else is available. Typically this helps one service find out how to connect and use another service. There are three main processes IMO for applications to use this layer:

Registration
- when a service gets installed or “comes up” it needs to initially be registered with the discovery layer. An example of this is Registrator (https://github.com/gliderlabs/registrator) which reacts to docker containers starting and sends key/value pairs of data to a tool like Consul or Etcd to keep current discovery data about the service. Such information could be IP Endpoint, Port, API URL/Path, Resources, etc that can be used by the service.
Discovery
- Discovery is the other end of Registration, when a service wants to use a (for instance) “proxy”, how will it know where the proxy lives or how to access it? Typically in applications this information is in a configuration file or hard coded into the app, whith service discovery all the app needs to do is know how to implement the information owned by the registration mechanism. For instance, an app can start and immediately say “Where is ‘Proxy'” and the discovery mechanism can respond by saying “Here is the Proxy thats closest to you” or “Here is the first Proxy available” along with the IP and Port of that proxy, the app can then just use those values, typically given in JSON or XML and use them inside the application thus not ever hardcoding any configuration anywhere.
Consume
- Last but smallest is when the applications received the response back from the discovery mechanism it must know how to process and the the data. E.g. if your asking for a proxy or asking for a database the information given back would be different for the proxy versus how you would actually access the database.

Persistence and Backing Services

Today most applications are stateless applications, which means that they do not own any persistence themselves. You can think of a stateless application as a web-server, this web-server processes requests and talks to a database, but the database is another microsevice and this is where all the state lives. We can scale the web-server as much as we want and even actively load balance those endpoints without ever worrying about any state. However, that database I mentioned in the above example is something we should worry about, because we want our data to be available, and protected at all times. Though, you can’t (today) spin up your entire application stack (e.g. MongoDB, Express.js, Angular.js and Node.js) all in different services (containers) and not worry about how your data is stored, if you do this today you need persistent volumes that can be flexible enough to move with your apps container, which is hard to do today, the data container is just not as flexible as we need it be in todays architectures like Mesos and Cloud Foundry. Today persistence is added via Backing Services (http://12factor.net/backing-services) which are persistence / data layers that exist outside of the normal application lifecycle. This mean that in order to use a database one must first create the backing service then bind it to the application. Cloud Foundry does this today via “cf create-service and cf bind-service APPLICATION SERVICE_INSTANCE” where SERVICE_INSTANCE is the backing store, you can see more about that here. I won’t dig into this anymore other to say that this is problem that needs to be solved, and making your data services as flexible as the rest of your microservices architecture is not easy feat. The below link is a great article by Luke Marsden of ClusterHQ that talks about this very issue. http://www.infoq.com/articles/microservices-revolution

I also wanted to mention an interesting note on persistence in the way Netflix deploys Cassandra. All the data that Netflix uses is deployed on Amazon on EC2 instance and they use ephemeral storage! Which means when the node dies all their data is gone. But alas! they don’t worry about this type of issue anymore because Cassandra’s distributed, self healing architecture allows Netflix to move around their persistence layers and automatically scale them out when needed. I found out they do run incremental backups to S3 by briefly speaking with Adrian Cockcroft at offices hours at the Oreilly Software Architecture conference. I found this to be a pretty interesting point to how Netflix runs its operations for its data layers with Cassandra showing that these cloud-native, flexible, and de-coupled applications are actually working in production and remain reliable and resilient.

Major Players

Some of the major players in the field today include: (I may have missed some)

Pivotal
Apache Mesos
Joyent (Manta)
IBM Bluemix
Cloud Foundry
Amazon Elastic Container Service
Openshift by RedHat
OpenStack Magnum
CoreOS (with Kubernetes) see (Project Tectonic)
Docker (and Assorted Tools/Binaries)
Cononical’s LXD
Tutum
Giant Swarm

There are many other open source tools at work like Docker Swarm, Consul, Registrator, Powerstrip, Socketplane.io (Now owned by Docker Inc), Docker Compose, Fleet, Weave. Flocker and many more. This is just a token to how this field of technology is booming and were going to see many fast changes in the near future. Its clear that future importance of deploying a service and not caring about the “right layers” or infrastructure will be key. Enabling data flexibility without tight couplings to the service is part of an the overall application design or the data service. These architectures can be powerful for your applications and for your data itself. Ecosystems and communities alike are clearly coming together to help and try to solve problems for these architectures, I’m sure some things are coming so keep posted.

Microservices on your laptop

One way to get some experience with these tools is to run some examples on your laptop, checkout Lattice (https://github.com/cloudfoundry-incubator/lattice/) from Cloud Foundry which allows you to run some microservice-like containerized workloads. This post is more about the high-level thinking, and I hope to have some more technical posts about some of the technologies like Lattice, Swarm, Registrator and others in the future.

Puppet, Chef, Orchestration and DevOps:

Ryan's Thoughts

"let's work within some interesting technology and write about it"

Category Archives: devops

Docker-based FIO I/O benchmarking

What is FIO?

Why Docker based fio-tools

How to use it

If your file is a remote raw text file, you can use REMOTEFILES

Run the fio-genplots script

Serve your Graph Images and Log Files

Easiest Way, run the “all in one” image. (Will auto produce IOPS and BW graphs and serve them)

Other Examples

Produce Bandwidth Graphs

Produce IOPS graphs

Simply serve them on port 8000

To use the all-in-one image

To use with docker-machine/boot2docker/DockerForMac

Notes

How to get graphs

Testing and building with codefresh

Conclusion

Service Oriented Architecture vs Modern Microservices: Whats the difference?

Service-Oriented Architecture:

Modern Microservces:

The Bulkhead Analogy

Similarities and Differences or the two architectures / architecture styles:

Similarities

Differences

Conclusion

What Would Microservices do!?

Microservices

Importance of APIs

Service Discovery and Registration

Persistence and Backing Services

Major Players

Microservices on your laptop

Puppet, Chef, Orchestration and DevOps: