Thursday, May 30, 2019

Getting started with hybrid patterns and practices

Our solutions team here at Google Cloud is made up of solutions architects who are industry veterans and experts in cloud architecture and applications. Our goal is to help you put Google Cloud Platform (GCP) services together for you to solve your business needs and create the best solution for the infrastructure you’re building.

One topic we work on a lot is hybrid cloud. As we hear from many of our customers, you want to move some of your workloads to the cloud to create a hybrid cloud, with some workloads on-premises and some on GCP. Where do you start? What do you have to think about? What does the topology look like? In this post, we’ll look at some of the solutions that we've published that can help you implement hybrid cloud topologies, starting from the very beginning of setting them up.

Hybrid cloud patterns and best practices


Implementing a cloud architecture that involves workloads that run on-premises, on GCP, and possibly on another cloud provider can be a bit challenging. Let’s start with an overview of the process and what that architecture might look like once you’ve implemented it.

Our Hybrid and Multi-Cloud Patterns and Practices series addresses precisely the types of questions you're probably asking. This series, written by one of our solutions architects, Johannes Passing, distills his decade-plus years of experience with creating cloud-based architectures into solutions.

The series starts by walking through the preliminaries, like articulating what your goals are for using hybrid cloud. You’ll then see some of the options for moving workloads to the cloud, and which approach might best suit your goals. The discussion is copiously illustrated with diagrams that offer a high-level view of what a hybrid solution might look like, such as this one:



At each stage, you’ll see a list of the advantages of the various approaches to hybrid cloud and a concise list of best practices. Everything in the documents is very much rooted in the author’s hands-on experience with designing these types of systems.

Authentication and single sign-on in hybrid cloud


Managing authentication and authorization in a hybrid environment generally means matching your existing, on-premises identity system with how it's done in GCP. For example, you might already run Active Directory on-premises. How can you map your user identities to GCP identities so that your users don't have to sign in separately to your on-premises services and to GCP?

In a three-part series, Federating Google Cloud Platform with Active Directory, Johannes tackles the topic of integrating Active Directory with Cloud Identity using Google Cloud Directory Sync. This series discusses how to deal with various Active Directory topologies (such as single or multi-forest), and how to perform Windows-integrated authentication and single sign-on (SSO) for apps running on GCP.

"Rip and replace" with GKE


There’s another approach to moving systems to the cloud. If you’re modernizing a complex website to a refactored, container-based microservices platform (Google Kubernetes Engine, or GKE) on GCP, check out Migrating a monolithic application to microservices on GCP, from solutions architect and DevOps engineering veteran Théo Chamley and Marco Ferrari.

As an example in this solution, Theo uses an e-commerce site. You’ll see how to perform the migration feature by feature, avoiding the risks of a large-scale, all-at-once migration. During the migration itself, the application has a hybrid architecture, where some features are in the cloud and some are still on-premises. After the migration is finished, the complete application is hosted in the cloud, but it still interacts with back-end services that remain on-premises. In addition to describing the architecture of various steps in this migration, you’ll see how to take advantage of a variety of GCP services as part of the process, including Cloud Interconnect.

Wait, there's more


Several other solutions architects have also been writing about hybrid architectures to share best practices and offer advice. Here are a few to check out:

  • In TCP optimization for network performance in GCP and hybrid scenarios, Kishor Aher explains how to tune network performance when moving workloads from on-premises to GCP. You’ll get a look at the details of TCP transmission so that you can understand why his recommendations can help reduce network latencies.
  • What if you want to communicate between GCP and another cloud without using public IP addresses? Etsuji Nakai's solution Using APIs from an External Network shows how to use a private network on Amazon Virtual Private Cloud (Amazon VPC) to emulate an on-premises private network.

Wednesday, May 29, 2019

Building the cloud-native future at Google Cloud

From its first open-source commit five years ago to now, Kubernetes has become the industry standard for modern application architecture. It was built on over a decade of Google’s experience as the world’s largest containerized application user. And it’s from this deep and continued investment that Google Cloud provides industry-leading solutions for running workloads at enterprise scale.

One of the most exciting outcomes of this shift toward cloud-native computing is the innovation built on top of Kubernetes. At Google, we love to solve challenging problems, and then share our experiences at scale with the world. This ethos is what brought Kubernetes to life, and it’s also the force behind Knative, Istio, gVisor, Kubeflow, Tekton, and other cloud-native open-source projects that we lead.

We think of it as our job to not only dream about the future, but also to design and implement it. Here’s an overview of open-source projects tied to Kubernetes that we’re working on. We know that speculating about the future can be tricky, but these projects offer a glimpse into how we’re building a cloud-native future. Let’s take a look.



Start with Kubernetes


Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It is the industry’s de facto container orchestrator, and is the heart of the cloud-native movement.

We’re proud of our contributions to the Kubernetes project, as we serve the community in many important ways. Google remains the top technical contributor to the project, as well as being actively involved in nearly all special interest groups (SIGs), subprojects, the steering committee, and as code approvers and reviewers. We constantly integrate our real-world experience at scale into the project, just as we have from the beginning.

When we look at the future of Kubernetes, we see the API extension ecosystem maturing and growing even further. We also see a more holistic approach to scalability, so it’s not just about how many nodes or pods are deployed, but how Kubernetes is used across real-world, production environments with widely-varying requirements. Improved reliability is another important facet of this work, as even more mission-critical workloads move to Kubernetes.

Istio


Istio is a service mesh that helps manage, secure and observe traffic between services. The project evolved out of the need for developers adopting microservices to help understand and control the traffic between those services without requiring code changes.

Istio uses the Envoy proxy as a sidecar to collect detailed network traffic statistics and other data from the co-located application, as well as provide logging and tracing. It optionally secures traffic using mTLS (and automatically generates and rotates certificates). Finally, it provides Kubernetes-style APIs to provide advanced networking functionality (for example, the ability to run canary tests, change retry policy at runtime, or add circuit-breaking).

The upcoming version, 1.2, will feature a new operator-based installer and numerous testing and quality improvements. For the rest of 2019, componentization and ease of use will take center stage, as well as architectural improvements that will increase modularity, allow powerful dataplane extensibility, and enhance reliability and performance.

Knative


Knative is a Kubernetes-based platform to build, deploy, and manage modern stateless workloads. Knative components abstract away the complexity and enable developers to focus on what matters to them—solving important business problems.

Just last week, the Knative team released the latest version, v0.6. Besides incremental reliability and stability enhancements, this release also exposes more powerful routing capabilities and improved support for GitOps-like operational use cases. Also, starting with this release, developers can now easily migrate simple apps from Kubernetes Deployments without changes, making service deployment easier for anyone who’s familiar with the Kubernetes resource model.

Since it was announced 10 months ago, a number of commercial offerings already use underlying Knative primitives. Today, the Knative community includes 400+ contributors associated with over 50 different companies, who with the v0.6 release have made 4,000+ pull requests. We are excited about this momentum and look forward to working with the community on further improving the developer experience on Kubernetes.

gVisor


gVisor is an open-source, OCI-compatible sandbox runtime that provides a virtualized container environment. It runs containers with a new user-space kernel, delivering a low-overhead container security solution for high-density applications. gVisor integrates with Docker, containerd and Kubernetes, making it easier to improve the security isolation of your containers while still using familiar tooling. Additionally, gVisor supports a variety of underlying mechanisms for intercepting application calls, allowing it to run in diverse host environments, including cloud-hosted virtual machines.

gVisor was open sourced in May 2018 at KubeCon EU. Since then, the gVisor team has added multi-container support for Kubernetes, released a suite of tests containing more than 1,500 individual tests, released a minikube add-on, integrated it with containerd, and further improved isolation and compatibility. The gVisor team recently began hosting community meetings and is working to grow the users and community around container isolation and gVisor.

Tekton


Tekton is a set of standardized Kubernetes-native primitives for building and running Continuous Delivery workflows. It allows users to express their Continuous Integration, Deployment and Delivery pipelines as Kubernetes CRDs, and run them in any Kubernetes cluster.

We started Tekton last year and donated it to the open Continuous Delivery Foundation earlier this year. Tekton APIs are still in alpha, but we look forward to stabilizing them and adding support for automated deployments, vendor-agnostic pull requests, GitOps workflows, automated compliance-as-code and more!

Forseti Security


Forseti Security is a collection of community-driven, open-source tools to help you expand upon the security of your Google Cloud Platform (GCP) environments. It takes a snapshot of your GCP resources metadata, audits those resources by comparing the configuration with the policies you defined, and notifies you of violations on an ongoing basis.

With Forseti, you can ensure your GKE clusters are provisioned with security and governance guardrails by scanning your GKE resource metadata and making sure the configurations are as expected. Forseti’s Validator Scanner lets you define custom security and governance constraints in Rego to check for violations in your GKE resource metadata.

In addition, you can reuse these constraints for pre-deployment checks with Terraform Validator. A set of canned constraints are available in the Policy Library. The Forseti community will continue contributing new constraints to harden your GKE environment. Get started with Forseti Validator Scanner here.

Kubeflow


Kubeflow is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Its goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML on a variety of infrastructures. The Kubeflow project is supported by 100+ contributors from 20+ organizations.

Kubeflow is on the road to 1.0, and we're hard at work building a powerful development experience that will allow data scientists to build, train and deploy from notebooks, as well as the enterprise stability and features ML operations teams need to deploy and scale advanced data science workflows. Hear more about this effort in this session from KubeCon NA 2018, and follow us on Twitter @kubeflow.

Skaffold


Skaffold is a command line tool that makes it fast and easy to develop applications on Kubernetes. Skaffold automates the local development loop for you; skaffold dev rebuilds your images and redeploys your app to Kubernetes on every code change. You can also use Skaffold as a building block for CI/CD pipelines with skaffold run. It’s language-agnostic and has an increasing number of configurable, flexible image builders (jib, docker, bazel, kaniko), deployers (kustomize, kubectl, helm) and automated tagging policies, making it a great fit for more and more Kubernetes development workflows.

We use Skaffold under the hood for Cloud Code for IntelliJ and VSCode and also for Jenkins-X. Skaffold is currently in beta, and will soon graduate to 1.0.0.

Follow our progress on our GitHub repo, and share your thoughts with the #skaffold hashtag on Twitter!

Gatekeeper


Gatekeeper is a customizable admission webhook. It allows cluster administrators and security practitioners to develop, share and enforce policies and config validation via parameterized, easily configurable constraint CRDs. Constraints are portable and could also be used to validate commits to the source-of-truth repo in CI/CD pipelines.

With Gatekeeper, you can help developers comply with internal governance and best practices, freeing up your time and theirs. You can do things like require developers to set ownership labels, apply resource limits to their pods, or prohibit them from using the :latest tag. Using Gatekeeper's audit functionality, you can easily find any pre-existing resources that are in violation of current best practices.

Google is proud to be collaborating with Microsoft and Styra (the creators of Open Policy Agent) on this project. Gatekeeper is currently in alpha and we welcome user feedback and contributions.

Krew


Krew is a plugin manager for kubectl that helps users discover and install kubectl plugins to improve their kubectl experiences. Originally developed at Google, Krew is now a part of Kubernetes SIG CLI.  

The future is now


Building cloud-native apps on top of Kubernetes isn’t some abstract, aspirational goal. The tools you need are here today, and they’re only getting better.

Tuesday, May 28, 2019

Integrated audit transparency for GCP and G Suite

Google Groups is a critical tool to control access to your Google Cloud Platform (GCP) projects, and you’ve told us that having Google Group audit logs available in Cloud Audit Logs would help streamline security and access monitoring. We’ve been working to unify these audit logs so you don’t have to integrate with multiple APIs to get a complete audit inventory of your GCP environment, and now, you can access the Google Groups audit logs right from within Cloud Audit Logs. This is an opt-in feature that you can turn on through the Admin console’s Data Sharing section under the Legal & Compliance.

Using Google Groups to manage your organization’s data access


Google Groups are the recommended way to grant access to GCP resources when using IAM policies. Groups help you centralize access control, reduce duplication, delegate access management and scale your GCP environments securely. This launch is one of many investments we’re making to simplify using Google Groups within GCP.

Google Cloud Audit Logs


Cloud Audit Logs is a Stackdriver security offering that lets you answer the question “who did what, when and where?” for your GCP environment.  It contains audit trails of all administrative changes, and data accesses of cloud resources by users.

At the nucleus of all security operations, Cloud Audit Logs makes it possible to identify patterns of threat via Event Threat Detection, alert on security abnormalities via Cloud Security Command Center, remediate incidents via Stackdriver Incident Response and Remediation, and satisfy compliance requirements such as the NIST 800-92 Guide to Computer Security Log Management.

A view into the future


As more customers adopt G Suite and GCP to modernize their collaboration tools and applications, you’ve asked us to provide a more unified and consistent management plane. That is why we are bringing  group management directly into the Google Cloud Console. This includes various streams of security logs, audit logs from Cloud Identity, and G Suite audit logs. For example, when a Cloud Identity or G Suite administrator adds a user, or turns on a G Suite service, an audit log appears in both the G Suite Admin Audit Log, as well as the GCP Admin Activity Audit Log. Likewise, when a user signs into your domain, it’s recorded in the G Suite Login Audit Log and GCP Cloud Audit Log.

Monday, May 27, 2019

How Google Cloud CRE helps SaaS companies prevent unanticipated failures

At Google, our Customer Reliability Engineering (CRE) teams work with customers to help implement Site Reliability Engineering (SRE) practices to continually attain their reliability goals. This work often includes defining objectives and implementing operational best practices like blameless postmortems or analyzing error budget spend.

Following CRE practices is especially important when changes are made in the customer's product. But what about when changes are released within Google Cloud Platform (GCP), where the product runs? We’ve heard that you want to test your products against future GCP releases to ensure reliability and performance when the underlying cloud service changes. We are happy to announce that preview zones are now available to let you test your own production code against future releases of GCP.

We’ve been working recently with many of our SaaS company partners and we’re happy to announce that we’ve expanded our CRE for SaaS program to address these needs. You can see how it works here:



With this expansion, our SaaS partners who have enrolled in our CRE for SaaS program now have an option to run a copy of their production applications in the preview zone. This lets partners detect unanticipated failures of applications running on future releases of GCP services. We put a number of unreleased “Day 0 binaries,” our soon-to-be-released code, in this zone. Then partners can test their production applications against that code. This way, we can anticipate and avoid previously unknown failure modes before users do—giving both us and our partners a chance to investigate the pending changes and address them.

BrightInsight (a Flex company), this year’s winner of the Google Cloud Healthcare Partner award, has been using the preview zone, and finds it helpful both in preventing unanticipated failures as well as supporting regulatory compliance requirements within the healthcare industry.

To use the preview zone, you’ll need to have defined your SLOs so that Google can integrate them with additional test frameworks. If you don’t have SLOs defined, we’ve built SLO Guide, a new tool to help you discover what you should measure based on common architectures and critical user journeys. It will help you quickly create SLOs that measure what your users actually care about. 

Saturday, May 4, 2019

UPS uses Google Cloud to build the global smart logistics network of the future

The power of data analytics and machine learning is making it possible for companies that have mastered entire industries to take the next step and digitally transform their business. One of my favorite examples is United Parcel Service (UPS), which started out as a messenger company in 1907 and has steadily grown to become the largest package delivery and specialized transportation and logistics company in the world.



Throughout the advent of e-commerce, UPS continues to play an even greater role in the movement of goods around the globe, and yet this 112 year-old company is just getting started. The massive amounts of data underlying its operations provide the foundation for UPS to lead the way in implementing more efficient, profitable and forward-thinking approaches in running its business.

To fully appreciate the scale of the opportunity, it helps to start with the numbers:

  • Every day, UPS delivers 21 million packages in more than 220 countries worldwide. During the all-important holiday season, the number of packages delivered per day can reach its peak.
  • The drivers who make that possible perform 120 pickup and dropoff stops daily.
  • The number of possible routes each driver can take from stop number one to stop number 120 is unthinkably large at 199 digits.


Sifting through all of this data to select the single best, most efficient and cost effective route is the perfect challenge for Google Cloud.

Working in collaboration with Google Cloud Platform (GCP), UPS was able to design routing software that tells the delivery driver exactly where to go, every step of the way. The routing software saves the company up to $400 million a year, and reduces fuel consumption by 10 million gallons a year.

At our Google Cloud Next ‘19 conference last month, Juan Perez, Chief Information Officer at UPS, talked about how the work we’re doing together is transforming the company’s smart logistics network. “We’re grateful for the opportunity to collaborate with great partners like Google in a way that lets us use our joint expertise to bolster visibility across supply chains around the world.”

This is the power of analytics at scale, and it’s just the beginning. Today, Google Cloud’s BigQuery also helps UPS power the most precise and comprehensive forecasting in the company’s history. GCP provides the capacity to run machine learning models across 1 billion data points per day, including package weight, shape and size, and facility capacity across the network. The insights extracted from that data help inform UPS on how to load delivery vehicles, make more targeted operations adjustments, and minimize forecast uncertainty, especially around the holidays.

Friday, May 3, 2019

Google Cloud networking Simplify routing between your VPCs with VPC peering

Google Cloud networking products and services fall into five main pillars: ‘Connect,’ ‘Scale,’ ‘Secure,’ ‘Optimize,’ and ‘Modernize.’ At Google Cloud Next ‘19 we announced several additions to our networking portfolio, and heard from customers, prospects and partners who wanted to learn more about the technical aspects of these announcements. What follows is a deep dive into the Connect pillar, exploring the enhanced routing capabilities in Google Cloud VPC. Stay tuned in the coming weeks as we explore the Google Cloud networking pillars in depth.

Network routing is about creating reliable paths between multiple networks by exchanging IP address information, where a network is either remote behind some type of hybrid connectivity service or a Virtual Private Cloud (VPC) network.

Today, we thought we would share a little more insight into how to use a new VPC peering capability to help you improve your on-prem connectivity to Google Cloud Platform (GCP), share VPNs across multiple VPCs or accessing a third party appliance on a peered VPC.

In Google Cloud a VPC is global so VPC peering is not needed to communicate between regions. Still, organizations may want to separate their deployments in different VPCs for isolation purposes and in this case VPC peering is ideal to keep those entities connected. But until now, you could only exchange subnet routes with VPC peering. For example, if you learned a BGP dynamic route in one VPC via Cloud Router, it couldn’t be used or wasn’t visible from any of its peered VPCs.

At Google Cloud Next '19, we announced that you can now exchange any type of routes between two peered VPC networks, including static routes, subnet routes and dynamic BGP routes. Let’s look at a couple of use cases where it might be useful.

Using a peered VPC service with static routes


Many applications or services are using static routes instead of subnets routes for connectivity. An example is using Equal Cost Multi-Path (ECMP) with static routes to load balance traffic to multiple third party appliances. Starting now, you can set up your VPC peering so that two VPCs exchange their static routes; this means that those appliances are available from another VPC. You can do this by configuring import/export policies on a VPC peering connection. By default only subnet routes are exchanged across peers.

In the following example, there are two VPC networks. VPC-A is peered with VPC-B. A static route is created on VPC-B. VPC-B exports that route to VPC-A which is importing it. It results in the static route being visible in VPC-A.

Better connectivity from an on-prem network


Imagine that you have two VPCs connected via VPC peering and you would like to reach both of them from an on-prem network with a single VPN. This is a very common use case as many managed services in GCP use VPC peering, including Cloud SQL. (Note: To better understand the existing types of services and connections in GCP, check out this Google Cloud Next '19 breakout session on how to privately access your Google Cloud or third-party managed services.)

Connecting those VPCs from an on-prem network means that you need the on-prem routes to be advertised to both VPCs. In the example below, VPC-A is connected to an on-prem network and to another VPC-B. On-prem routes are exported to VPC-B through VPC-A, resulting in the connectivity between the on-prem network and both VPCs.

You can use this functionality to share a single on-prem hybrid connection such as a VPN tunnel or an interconnect between multiple VPC networks, by creating a transit VPC.

What’s next for VPC connectivity


As enterprises migrate different types of workloads, public cloud providers’ networking topologies will become more complex. GCP routing solutions like VPC peering will continue to become more flexible with extensible policy filters to fine-tune your connectivity and security boundaries. In a way, VPC peering inherits many attributes of traditional routing protocols like BGP.

Thursday, May 2, 2019

Cloud rendering platform Zync Render gets a major update

Zync Render, part of Google Cloud Platform (GCP), is our cloud-hosted rendering platform that helps visual effects and animation studios realize their creative vision. Zync Render has helped render everything from major Hollywood feature films and TV advertising to brand design.

As it’s our mission to continue to enable the users behind these projects to create visually stunning content, we’ve spent the last several months optimizing our core engineering infrastructure. We’re excited to launch Zync version 2.0 on GCP. This consists of a complete Google-native rewrite of the application, providing benefits such as faster job start-up time, increased compute scalability, and several other new features that users have identified as critical to workflows.



Zync also now offers up to 48,000 CPU rendering cores, allowing even the largest jobs to compute quickly and efficiently. Additionally, we’ve implemented the ability to set usage quotas on a per-site, per-project and per-user basis, giving more control to larger organizations with multiple locations and artists. Here’s a look at Zync user quotas:

Zync has taken advantage of the multitude of GPU offerings available on GCP. These offerings work with some of the leading software vendors of GPU rendering technology so users can render using more high-performing cloud resources for better performance than what they could typically achieve on-premises.

Additional Zync updates include supporting Chaos Group’s V-Ray for Maxon Cinema 4D, one of the most popular renderers on the Maxon platform, and reducing the price for all our V-Ray supported offerings by up to 37 percent for more cost-effective project rendering on GCP.