[GCP ML engineer certification Day1]

Updated:

Google Cloud Platform Big Data and Machine Learning Fundamentals (Coursera Lecture)

Introduction to the Data and Machine Learning on Google Cloud Platform Specialization

Big Data Challenges

  1. Migrating existing data workloads(ex. Hadoop, Spark jobs)
  2. Analyzing large datasets at scale
  3. Building streaming data pipelines
  4. Applying machine learning to your data

Background of Google Cloud Platform

  • Google had to develop the infrastructure to ingest, manage, and serve all the data from applications (google map, gmail, android, … etc.) => billions of users
  • Google’s mission : Organize the world’s information and make it universally accessible and useful

Four fundamental aspects of Google’s core infrastructure

Gcp_infra

1. Compute Power

  • Google can train on its infrastructure and deploys ML to phone hardware
  • Leverage Google’s AI research with pre-trained AI building blocks ml_api

Creating a VM on Google Cloud Platform

  1. Create VM instances
  2. Compute
  3. Create bucket for storage (bucket name : should be globally unique)
  4. Copy files to bucket (command : gsutil cp)
  5. Delete/Stop VM instances
  6. Enable to change permission of data

2. Storage

  • One major way that Cloud Computing differs from desktop computing is that “Compute” and “Storage” are independent.
  • Getting your data from VM instances into your solution and transforming it for your purposes.
  • Data engineers should build data pipelines before they can start building machine learning models from the data.

Storage Class storage All class have multi-region, dual-region, and region location options.
They differ based on the access speed and the cost.

Google Cloud Platform resource hierarchy

organ

  • The organization is the root node of the entire GCP hierarchy. (necessary for folders, set all policies)
  • Project is a base-level organizing entity for creating and using resources and services.
  • Enable to collaborate with many other teams in the organization across many projects.
  • Cloud Identity and Access Management (IM / IAM) fine-tune access control of all GCP resources by IAM policies.

3. Networking

  • Google’s private network carries as much as 40% of the worlds’ Internet traffic every day.
  • Google’s data center network speed enables the separation of compute and storage. network

4. Security

security

  • Google handles many of the lower layers of security like, the physical security of the hardware and its premises, the encryption of data on disk, and the integrity of the physical network
  • Communications to Google Cloud are encrypted in transit (multiple layers of security)
  • Stored data is encrypted at rest and distributed for availability and reliability

    BigQuery

  • BigQuery table data encrypted with keys (and those keys are also encrypted)
  • Monitor and flag queries for anomalous behavior
  • Limit data access with authorized views.

Compute

  • Compute Engine
    • Google’s IaaS (Infrastructure as a Service) solution
    • provides maximum flexibility for people who prefer to manage server instances themselves
  • GKE(Google Kubernetes Engine)
    • Clusers of machines running containers
    • enables to run containerized applications in a cloud environment
    • Kubernetes is a way to orchestrate code that’s running in containers
    • Containerization is a way to package code that’s designed to be highly portable and to use resources very efficiently
  • App Engine
    • GCP’s fully managed PaaS (Platform as a Service) framework
    • just focus on your code and let Google deal with all the provisioning and resource management
    • used for long-livedd web applications
  • Cloud Functions
    • completely serverless execution environment
    • FaaS (Function as a Service)
    • used for code that’s triggered by an event such as a new file hitting cloud storage

Storage

  1. GCP offers relational and non-relational databases and worldwide object storage
  2. GCP storage reduces the work it takes to store different kinds of data
    • Cloud Bigtable
    • Cloud Storage
    • Cloud SQL
    • Cloud Spanner
    • Cloud Datastore

Big Data

GCP offers fully managed big data and machien learning services big_Data

Big data products on Google Cloud Platform

big_data_platform

Case Study

  1. What were the barriers or challenges the customer faced?
  2. How were these challenges solved with a cloud solution? What products did they use?
  3. What was the business impact?

Key roles in a data-driven organization

A common mistake that companies make is that they go out and hire 10 PhD machine learning scientists, and expect magic to happen. They focus on the ML researchers, and forget about all the help and guidance that the ML researchers will need.

Big data roles

data_organ

  • data engineers to build the pipelines and get you clean data.
  • Decision makers, to decide how deep you want to invest in a data-driven opportunity while weighing the benefits for the organization.
  • Analysts, to explore the data for insights and potential relationships that could be useful as features in a machine learning model.
  • Statisticians, to help make your data-inspired decisions become true data-driven decisions, with their added rigor.
  • Applied machine learning engineers, who have real-world experience building production machine learning models from the latest and best information and research by the researchers.
  • Data scientists, who have the mastery over analysis, statistics, and machine learning.
  • Analytics managers to lead the team.
  • Social scientists and ethicists to ensure that the quantitative impact is there for your project and, it’s the right thing to do.

Categories:

Updated:

Leave a comment