Migrating To Kubernetes

Migrating To Kubernetes

Quarter 4 2016 saw us embark on the first stage of our migration to the cloud, starting with planning out the migration of our eCommerce websites to the Google Cloud Platform (GCP) Container Engine.

Firstly, a bit of background to our previous infrastructure: Each client website had at least two compute instances, of varying spec depending on the levels of traffic we were expecting, which were added to a load balancer utilising a static IP. Each compute instance was sat on the GCP Compute Engine.

Our websites don’t connect to a persistence layer, nor do they connect directly with a back-end platform. Instead, they all utilise Venditan Commerce API (VC-API) to obtain all the data it requires to display the website to the end-user. This abstraction removes an element of the complexity of this migration and also allows us to focus primarily on the website itself and control the migration implementation by switching DNS.

The previous infrastructure made it difficult to scale with demand

The previous infrastructure made it difficult to scale with demand, as adding another instance would require several steps before you could add it to the load balancer.

Using the GCP Container Engine removes this headache, as you effectively instruct the Container Engine to manage the instances for you instead by creating a container cluster. A container cluster is a managed group of uniform VM instances for running Kubernetes. GCP Container Engine allows you to select how powerful you want the machine to be, which will directly impact the resources available to each deployment. It’s fine to be fairly conservative with the machine specification at this point as you can always increase the number of nodes in your cluster as required.

GCP Container Engine recently released the ability to have your clusters automatically upgrade and repair. We have disabled both of these options to provide more control over when the upgrades happen. The upgrades themselves are easy to do, but we have noticed a few minutes of intermittent downtime during the upgrade process so we like to do it during the early hours (GMT) to reduce the impact the upgrade has on our clients.

Within the container cluster there are node pools, that you can easily find within the GCP console. A new feature can also be seen here which is ‘Autoscaling’, but currently, this is considered to be still in beta and does not yet provide the optimum number of nodes so we have it turned off for the time being until the bugs are ironed out.

Cluster setup is dependent on your requirements, so the spec of your cluster and its nodes will differ dependent on your clients needs and what you are hosting (traffic, type of application, etc.). Clusters are useless without services and deployments, which allow you to create external services such as a web service, or internal services such as memcache that are used by your other services and deployments. We created services for ‘web’, Redis and memcache.

Since the move to Kubernetes, we have not had any issues with any of the memcache services

The memcache service is a step forward for us, as with our previous infrastructure we had one instance in Compute Engine running memcache to service all of our websites. With the move to Kubernetes, each website has its own memcache service, improving resilience and delivering a robust solution for our clients. Since the move to Kubernetes, we have not had any issues with any of the memcache services, whereas on the previous infrastructure it was a regular occurrence (mainly running out of memory).

One usual area of concern is the deployment process, but with Kubernetes we have seen a big improvement. We create images that are pushed to a GCP bucket and then used by the containers. The image is the environment for each website including all apache configuration and SSL certification, and the application itself. This has meant that the deployment and rollback processes are a case of swapping out the image tag version that is currently being used by the containers.

Kubernetes will then fan down the old deployments that used the old image and fan up the new deployments that use the new image. This removes the potential for users to see an issue and instead acts as a seamless switch between two versions of the application you’re deploying.

Although Kubernetes has given us the opportunity to improve our infrastructure, it had numerous dependencies on other technology to allow us to actually do the migration.

We utilise docker heavily to build the server environment based on alpine packages, setting up the apache/nginx web service and configuration files that have numerous environment settings that our front-end applications use. The use of docker also provides a stable, production-like environment in which our development team can work on in all scenarios but on their local machines.

With the migration to Kubernetes we upgraded our front-end application to PHP 7, and have seen marginal performance improvements as a result. Add nginx into the mix and you start to make a bigger performance improvement collectively. With nginx we have seen average server connection time go up a little, average page download time half but no difference in the average server response time. From this you can determine that the end user will see a benefit and with every upgrade and every change in technology we put the end user first, to ensure we are delivering the most performant solution possible.

As a whole, we’ve definitely had a successful migration to Kubernetes. The process has delivered a better service to our clients and as developers we have more trust in the infrastructure. Developments that have occurred since the migration have been easier, such as the migration from apache to nginx and moving towards HTTP2 and HTTPS across all of our websites. We’re only six months into this journey with Kubernetes, and as it continues to develop we are expecting to be delivering an even better solution to our clients.

Michael Simcoe
31st March 2017

Programmable Infrastructure Vs Infrastructure As Code

Programmable Infrastructure Vs Infrastructure As Code

Programmable Infrastructure, Infrastructure as Code? If you’ve never heard of these terms, you’re probably wondering what the hell they are. There is no competition between the 2 phrases as they both refer to the same concept dependent on which side of the buzz word fence you want to sit on. For the remainder of this article, I choose to use the phrase Infrastructure As Code as it’s easier on the tongue. 🙂

With the advent of cloud services such as Amazon EC2 and Google Cloud Platform with simple exposed APIs and SDKs (if available) and infrastructure management tools such as Puppet, Chef, Docker containers etc.., gone are the days when a tech company needed to hire dedicated system administrators to manage the infrastructure layer of their software platform (Financial controllers love to hear this kind of stuff). Now, all a tech company needs is a developer with an understanding of APIs and SDKS to write code to provision, deploy and manage infrastructure services. This is what Infrastructure as Code boils down to.

Of course you could argue that the developer nirvana described above is nothing short of configuration management, but 2 main differences with Infrastructure as Code are:

  • It is fully automated
  • A developer can build the infrastructure blueprint directly into the core of an application or into the build / deploy component of an application simply by writing code that describes the infrastructure. (Now isn’t that awesome!)

A typical highly simplified application workflow that takes advantage of Infrastructure as Code could be along the lines of this:

Infrastructure As Code offers a number of advantages to a development team looking to take their ninja DevOp skills to the next level, a few of which are:

  • You end up with documentation that describes your infrastructure from which you can generate other representations of your infrastructure such as pretty diagrams. (Don’t we all love pretty diagrams!)
  • You only need to look in one consistent place to make changes to your infrastructure which makes maintenance a breeze
  • You can chop and change your infrastructure at the speed at which it takes your developers to write code against an alternative API whether internal or external

You might ask if Infrastructure As Code is a concept worth adopting in your tech company or in your DevOps team, but it surely does offer a level of flexibility and raw infrastructure management power that you’ll find hard to ignore. If you haven’t already, join the bandwagon.

Kamba Abudu
26 Jun 2016

Terminal Velocity – 4 Great Ways To Work With Automatic Expansion In The Shell

Terminal Velocity – 4 Great Ways To Work With Automatic Expansion In The Shell

With so much getting done in the shell, lots of little shortcuts can all add up to make things quicker in the future. After seeing people doing it the long way round, here’s the first in our series on efficiency hacks.

Note that in this series I’m only going to cover the 3 most common shells people use today, Bash, Ksh and Zsh. If you use something else you may find some of this advice doesn’t work or is unreliable. In that case I suggest consulting the manual for your specific shell.

1) Globbing – An oldie but goodie that bears repeating

 

Globbing is the name given to the pattern matching for files. This should be something most regular users have used, but it’s always handy to have a cheat sheet:

 

Pattern Description Reg Exp Equivalent
* Matches any number of any character .*
? Matches exactly one character .
[AD-Z]

Matches any character in the specified range. Characters may be listed without hyphens to select from the listed characters in the same way RegExps work.

In the listed instance this will match every upper case character except for B and C, since they’re not listed, or included in the range D-Z

[AD-Z]
~name Matches a user’s home directory

 

Globbing in practice:

# Globbing basics
# Create the sample files
touch a ab bc cd de
# * Matches anything and everything
echo *
a ab bc cd de
echo a*
a ab
# ? Matches a single character
echo ?
a
echo ??
ab bc cd de
# Square brackets(and contents) match a single character
echo [abd]*
a ab bc de
echo [a-d]?
ab bc de
echo [ad-z]?
ab de
# ~ on it’s own matches your home directory
echo ~
/home/llord
# ~ followed by a username matches that user’s home directory
echo ~tom
/home/tom
echo ~root
/root

2) Extglob – Expanding on the available options

Almost all shells come with an expansion to the typical glob system that allows even more options, but before it can be used it needs to be turned, although the command to do so depends on your exact shell. If you’re unsure you can run

echo $SHELL
/bin/bash

to see what shell you’re running at the moment

 

Shell Command to turn on Command to turn off
Bash shopt -s extglob shopt -u extglob
Ksh N/A – Always on N/A – Always on
Zsh set -o ksh_glob set +o ksh_glob

Once enabled:

 

Pattern Description Reg Exp Equivalent
@(123|456) Matches any specified pattern, eg: “123” or “456” (123|456)
!(123) Matches anything not contained in the sub group (^123)
?(123) Optionally matches the sub group (123)?
*(123) Matches the sub group any number of times(including 0) (123)*
+(123) Matches the sub group at least once (123)+
Extglob in practice:

# extglob
# Create sample files
touch cheesecake chocolate-cake apple-pie pear-pie rice-pudding
echo *@(cake|pudding)
-bash: syntax error near unexpected token `(‘
shopt -s extglob
echo *@(cake|pudding)
cheesecake chocolate-cake rice-pudding
# You can even embed other globbing patterns, like ? or *
echo *-@(???|c*)
apple-pie chocolate-cake pear-pie
# Be careful using the not pattern, * is greedy and
# the not operator !(…) could match a single character.
echo *!(pie)
apple-pie cheesecake chocolate-cake pear-pie rice-pudding
echo !(*pie)
cheesecake chocolate-cake rice-pudding

3) More Globbing Options

There are more features than just extglob that can be turned on through shell options that affect the nature of globbing, although not all are available on all possible shells:

 

Option Description Bash Command Ksh Command Zsh Command
* will match .files like the .git directory, but not the . or .. symlinks. shopt -u dotglob set -o globdots
Will remove any patterns that don’t match anything instead of keeping the literal. shopt -u nullglob set -o nullglob set -o nullglob
Patterns that don’t match will result in an error that will prevent execution of the command. Very useful if you want to make sure your patterns are correct. shopt -u failglob set -o nomatch
Enables ** as a pattern to recursively match any directory. shopt -u globstar set -o globstar

Other options in practice:

# Other glob options
# Set up
mkdir 5th 6th 6th/7th
touch .first .second third fourth 5th/a 5th/b 6th/c 6th/7th/d
# By default * ignores hidden files
echo *
5th 7th fourth third
shopt -s dotglob
# with dotglob they should match
echo *
5th 6th .first fourth .second third
># By default unmatched params are interpreted as literals
echo cat*
cat*
shopt -s nullglob
# with nullglob they’re removed from the param list
echo cat*

# Turn nullglob back off and turn failglob on
shopt -u nullglob
shopt -s failglob
# with failglob on you get an error
echo cat*
-bash: no match: cat*
# without globstar there are no recursive calls
echo **
5th 6th .first fourth .second third
shopt -s globstar
# With globstar it is a recursive directory search
echo **
5th 5th/a 5th/b 6th 6th/c 6th/7th 6th/7th/d .first .second third fourth

4) Brace Expansion

While globbing will only expand to match filenames that actually exist, brace expansion will always expand to every possible combination of the provided input.

Globbing is useful for working with already existing files. Whereas brace expansion is a general purpose tool that may be used to generate file or folder names that don’t exist, as well as a wide range of options not related to the filesystem.

Brace expansion works in two ways:

1. As a comma separated list:

{a,b,c}

# Brace expansion – lists
# Brace expansion is really useful for producing cartesian products,
echo {a,b,c}{1,2,3}
a1 a2 a3 b1 b2 b3 c1 c2 c3
# Creating a whole dir structure without repeating names,
echo {application/{controllers,library,modules,plugins,view},pub}/
application/controllers/ application/library/ application/modules/
application/plugins/ application/view pub/
# Or even just rename a file with a minor change
$ echo V{ei,ie}wHelper
VeiwHelper ViewHelper

2. As a range:

{a..c}

# Brace expansion – ranges
# Range expansion is good for lists and loops
for i in {1..25};do echo -n “$i “; done
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# For example, loop over every letter in the alphabet,
# and count file starting with that letter:
for i in {a..z}; do echo -n “$i “; ls -1 $i* | wc -l; done
a 0
b 1
c 7
d 0
e 12

We hope you find these shortcuts useful, next part on making your life easier in directories coming soon.

Liam Lord
24 Jul 2015

SSL and PHP Streams – You Are Doing It Wrong

SSL and PHP Streams – You Are Doing It Wrong

The upcoming PHP 5.6 release brings with it a number of improvements to encrypted PHP streams, both internally and externally. In these articles I will try to cover the most important changes, and how they affect your code.

This article will focus on how to get the best security level in code that needs to be run on PHP versions below 5.6, and highlighting some of the gaps in the currently available functionality. Version 5.4.13 is the earliest version that supports all the options described below – and if you are running something earlier than this, then you really should consider upgrading to at least the latest version of the 5.4 series¹.

Client Communication

PHP provides some very simple APIs to use encrypted connections for client communication. It lets you make an encrypted HTTP request to retrieve some data with just one line of code:

<?php

$data = file_get_contents(‘https://example.com/file.ext’);

Great! We just retrieved some data from a remote server in completely secure manner, right? Wrong.

Problem 1: Peer verification

One of the most important parts of using encrypted connections is verifying that the remote peer you are communicating with is really who they say they are, and who you are expecting them to be. Without this crucial step, man-in-the-middle (MITM) attacks are trivial, and even though the data arrives on your machine encrypted it could have been stolen or altered by a 3rd party along the way.

If you’ve ever tried to use the cURL extension to retrieve an HTTPS resource, chances are you’ve seen this message:

SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

That’s because cURL attempts to verify and validate the certificate presented by the server and fails to do so, usually because it cannot be traced back to a trusted root certificate authority (CA). Often this is because the root certificate authority list is not installed or not correctly configured. A little research will then lead you to configure the `curl.cainfo` setting with a valid CA file, and your code will now work as expected.

But what about streams? If you make the same request using streams it works without complaining, so that’s clearly a better option – except it isn’t, because before PHP 5.6, PHP streams do not attempt to verify the peer certificate by default! This doesn’t mean the streams are fundamentally insecure, it just means you need to use stream context options to make them secure.

Let’s update our code to verify the peer certificate against a list of trusted root certificate authorities:

<?php

$contextOptions = [
‘ssl’ => [
‘verify_peer’ => true,
‘cafile’ => ‘/path/to/cafile.pem’,
‘CN_match’ => ‘example.com’,
]
];
$context = stream_context_create($contextOptions);

$data = file_get_contents(‘https://example.com/file.ext’, false, $context);

This allows us to verify the server’s certificate against a trusted CA chain. Setting `verify_peer` to `true` instructs PHP to perform the verification process, and the `cafile` option supplies the trusted CA data to verify against. This can also be specified using the `capath` option, which allows you to store the trusted certificates in separate files in the specified directory. More details of how this needs to be formatted are available at openssl.org.

We must also specify the expected peer name on the presented certificate with the `CN_match` option, it will not be inferred from the URL and if it is not specified, the name on the certificate will not be validated. It must match the Common Name field of the certificate, the Subject Alternative Names field will not be considered – this is a pretty major limitation on the modern internet.

There is currently no mechanism for verifying a certificate fingerprint.

Problem 2: Cipher lists

There are many different ways to encrypt data – many different algorithms with many different sub-variants. PHP simply lets OpenSSL deal with this problem, by specifying `DEFAULT` as the list of acceptable algorithms (cipher list). Unfortunately, this list allows almost anything, including a number of very weak ciphers that can be easily broken by a determined attacker, allowing them to steal data.

The `ciphers` context option allows us to specify a list of acceptable ciphers to use with the connection – if one of these cannot be negotiated by both server and client, the connection will fail before any potentially sensitive data is transmitted. Let’s update our code so specify that only `HIGH` encryption cipher suites may be used ²:

<?php

$contextOptions = [
‘ssl’ => [
‘verify_peer’ => true,
‘cafile’ => ‘/path/to/cafile.pem’,
‘CN_match’ => ‘example.com’,
‘ciphers’ => ‘HIGH’,
],
];
$context = stream_context_create($contextOptions);

$data = file_get_contents(‘https://example.com/file.ext’, false, $context);

An explanation of the format and available options for the cipher list is available at openssl.org. Mozilla’s recommended cipher list for servers is published here.

Problem 3: Protocol support

PHP streams support SSL versions 2 and 3, and TLS version 1.0 3. Moreover, it’s not possible to directly specify which protocol will be used with an `https://` or `ftps://` connection. SSLv2 is very broken, and SSLv3 is also less than desirable. On the modern internet, there are very few servers that do not support at least TLS version 1.0.

While it’s not currently possible to directly specify which protocol to use, it is possible to forbid them using the cipher list, so lets update our code to require TLS:

<?php

$contextOptions = [
‘ssl’ => [
‘verify_peer’ => true,
‘cafile’ => ‘/path/to/cafile.pem’,
‘CN_match’ => ‘example.com’,
‘ciphers’ => ‘HIGH:!SSLv2:!SSLv3’,
],
];
$context = stream_context_create($contextOptions);

$data = file_get_contents(‘https://example.com/file.ext’, false, $context);

When creating raw socket streams it is possible to directly specify the protocol to use, using either the approiate URI scheme (`sslv2://`, `sslv3://` or `tls://`) when creating the socket, or by using the appropriate constant when enabling encryption on an existing TCP socket via `stream_socket_enable_crypto()`.

Even so, our code does not support the more modern TLS version 1.1 and 1.2 protocols, which contain a number of security improvements over TLS version 1.0. Luckily, in practice, these security improvements are largely theoretical – the known attacks against TLSv1.0 can and do work, but they are currently impractical to execute in real life applications without a level of system compromise that would render the attack pointless – the attacker can already steal your data much more easily by this point.

Problem 4: TLS compression attack vulnerability

The CRIME attack vector against TLS can be easily mitigated by disabling protocol-level compression. Unfortunately, in PHP it’s enabled by default ³. Luckily, since PHP 5.4.13, it can be easily disabled using a simple boolean context option `disable_compression`. Let’s update our code to use it:

<?php

$contextOptions = [
‘ssl’ => [
‘verify_peer’ => true,
‘cafile’ => ‘/path/to/cafile.pem’,
‘CN_match’ => ‘example.com’,
‘ciphers’ => ‘HIGH:!SSLv2:!SSLv3’,
‘disable_compression’ => true,
],
];
$context = stream_context_create($contextOptions);

$data = file_get_contents(‘https://example.com/file.ext’, false, $context);

Server Communication

Using PHP for managing server streams is less common, but it is possible and has been for a long time. Problems 2, 3 and 4 described above also apply to server streams (problem 1 doesn’t come into play unless you require a client certificate) but there are other issues with server streams that are currently not resolvable in PHP.

Problem 5: Cipher order

Some attack vectors are only possible if certain ciphers are used. Currently PHP instructs OpenSSL to use the cipher priorities specified by the client, potentially leaving the server open to attack.

Problem 6: TLS renegotiation attacks

As problems go, this is a big one. SSL version 3 and all versions of TLS allow renegotiation of the connection settings after it has been created – changing protocol and cipher lists. This process is vulnerable to a limited MitM attack (an attacker can inject data but cannot see the response) and it also opens a denial-of-service (DoS) flaw on the server.

Negotiating a secure connection considerably more expensive for the server than it is for the client in terms of CPU cycles, meaning that anyone with a laptop can bring your super-powerful server down by simply repeatedly renegotiating the connection. Moreover, this potential DoS attack is very difficult to detect on the edge firewall or the server firewall, because it only requires a single TCP connection, and does not create excessive traffic on that link.

At present, PHP provides no way to disable renegotiation, or limit the rate at which renegotiation requests will be honoured.

Summing Up 

Using secure connections in PHP streams is not as simple as it appears on the surface. Our original “one line” of code is now a complex set of options. It’s harder to read, and it’s impossible to write it correctly without understanding precisely what the various options do.

In order to obtain an acceptable level of security, the user is required to understand some of the technical elements of the cryptography they are attempting to use. This can be a daunting, and as result many applications are deployed using insecure settings – some of them so bad as to almost completely negate the expense of use encryption in the first place.

In the next article, we’ll look at how some of the changes in PHP 5.6 will make life a lot simpler for the average PHP developer to create truly secure communication routines.

It is important to note that the code samples from the article do not represent the one true “correct” way to make your arbitrary HTTPS request. You *must* understand the implications of these options before you use them.

¹ At the time or writing, the current release in the 5.4 branch is 5.4.29. This release, along with the 5.5.13 sister release, contain a behavioural regression with unserialising strings which breaks (amongst other things) elements of PHPUnit and Doctrine. It’s probably best to avoid these specific releases, 5.4.30 and 5.5.14 should “fix the fix”.

² This is not a recommendation, merely an example, although it’s not a bad jumping-off point.

³ This does not necessarily mean that a given connection will be compressed, as the server can also refuse to honour the client’s request to compress the data.

Chris Wright
26 June 2015

Meet Yaf… What?

Meet Yaf… What?

At Venditan, we like to keep up with the latest technology to assist us to build fantastic websites for our clients. In January of last year, the stable 2.2.9 release Yaf (Yet Another Framework) was released and we just had to have a play. Yaf is the first PHP MVC framework to be written in C and built as a PHP extension. The framework was written by laruence, who is also a PHP development team member leading APC development. It is considered the fastest and lowest resource consuming PHP framework around at the moment, and has been well tested in production applications in a number of large organisations.

As of now, we have three websites in production that are implemented using Yaf, and the performance is incredible, not that this is unexpected since it’s written in C. The benchmarks completed on Yaf are also overwhelming, with one result showing that Yaf is able to make 5331 requests per second, opposed to 634 requests per second for Zend, and 2300 requests per second for CodeIgniter.

Caught your attention? I hope so, but where to start? Yaf is a PECL extension and as such is easy to install. The documentation available is fairly limited, with the most detailed documentation written in Chinese which proves a little difficult to understand using Google Translate. We were fortunate enough to find some example PHP applications on Github which we were able to review to help us write our first sample application. I found the following flowchart helped me understand Yaf better and allowed me to see Yaf’s support for Bootstrap and its plug-in mechanism.

Over the past year, we have written a Yaf application that uses our SprintSDK to communicate with the SprintEcommerce API. We have extended on the Yaf Route Interface, allowing us to manage the routing based on responses from the API. Adding to the router stack within the Bootstrap is very simple and documented well within the PHP Docs.

In future posts we will look at the benchmarks in greater detail and how we overcame issues such as multiple template sets, but I hope this quick look at Yaf has inspired you to take a closer look.

Mike Simcoe
2nd Apr 2015