Saturday, December 26, 2020

GCP — Migrating Teradata to BigQuery

 














  • Create a Teradata source system running on a Compute Engine instance
  • Prepare the Teradata source system for the schema and data transfer
  • Configure the schema and data transfer service
  • Migrate the schema and data from Teradata to BigQuery
  • Translate Teradata SQL queries into compliant BigQuery Standard SQL


More Details: 

https://iamvigneshc.medium.com/gcp-migrating-teradata-to-bigquery-a655c44b2dbd

https://github.com/IamVigneshC/GCP-Migrating-Teradata-to-BigQuery

Thursday, December 24, 2020

Machine Learning on AWS - Implement a Data Ingestion Solution Using Amazon Kinesis Video Streams


Implement a Data Ingestion Solution Using Amazon Kinesis Video Streams

Create a Kinesis Video Stream for Media Playback

Globomantics is an analytics firm which handles computer vision projects for object detection, image classification, and image segmentation. Your role as a Data Architect is to stream real time video feeds from a source to AWS for further analytics. You'll create a Kinesis Video Stream where live video feeds will later be ingested

  1. Log in to the AWS Console.

  2. Under Find Services, type in and then click Kinesis Video Streams.

  3. Click on Create.

  4. For Video stream name enter VP52M8OQZ10HOQRB, and click on Create video stream.

You will be at the Video streams page for VP52M8OQZ10HOQRB, and in the Video stream info tab the Status will be Active.

Configure Java SDK Producer Library to Stream Video Feeds

  1. In the upper-left click Services, then type in and click on EC2.

  2. Click on Instances (running) and select the instance AnalyticsEngine.

  3. Click on Connect, and at the EC2 Instance Connect tab click Connect. Note: A new browser tab (or window) will open to a Linux command prompt. The EC2 was created for you when you started this lab, and the OS is Ubuntu.

  4. At the command prompt enter git clone https://github.com/ps-interactive/lab_aws_implement-data-ingestion-solution-using-amazon-kinesis-video-streams.git Note: This clones the Amazon Kinesis Video Streams Producer SDK at the path /home/ubuntu.

  5. Now enter cd lab_aws_implement-data-ingestion-solution-using-amazon-kinesis-video-streams/

Enter the following commands:

   sudo apt update -y 

   sudo apt install maven -y

   sudo apt install default-jdk -y

   sudo apt install git-all -y

Note: Enter the commands in the given order. They install the required applications to build and run the producer.

  • To compile and assemble the producer, enter the command mvn clean compile assembly:single

  • Run the following command with your access key:

  • java -classpath target/amazon-kinesis-video-streams-producer-sdk-java-1.11.0-jar-with-dependencies.jar -Daws.accessKeyId=<Access Key ID> -Daws.secretKey=<Secret Access Key> -Dkvs-stream=VP52M8OQZ10HOQRB -Djava.library.path=/home/ubuntu/lab_aws_implement-data-ingestion-solution-using-amazon-kinesis-video-streams/src/main/resources/lib/ubuntu/ com.amazonaws.kinesisvideo.demoapp.DemoAppMain

DEBUG lines will be output, indicating the creation of a continuous flow of video frames to the video stream you made in the last challenge.

Check the Media Playback for the Kinesis Video Stream Created

  1. Back in the AWS Console browser tab, in the upper-left click Services, then type in and click on Kinesis Video Streams.

  2. In the left-hand menu click Video streams, then click on the VP52M8OQZ10HOQRB link.

  3. Expand the Media playback section.

You'll observe real time video feeds from the producer library, which you will see as a video of a building with passing traffic.

Machine Learning on AWS - Implement a Data Ingestion Solution Using Amazon Kinesis Data Streams

 

Implement a Data Ingestion Solution Using Amazon Kinesis Data Streams

Create a Kinesis Data Stream

You are a data science consultant for a company called Globomantics, analyzing live temperature feeds. Your primary responsibility is to gather real time data from temperature sensors, and ingest this into a Kinesis Data Stream so that logs can be further analyzed. You will be configuring the Kinesis Data Stream, starting out initially with one shard.

  1. Login to the AWS Console.

  2. Under Find Services, type in and then click Kinesis.

  3. Ensure Kinesis Data Streams is selected, and then click the Create data stream button.

  4. For Data stream name enter RawStreamData.

  5. For Number of open shards enter 1.

  6. Click Create data stream.

Wait for about a minute until the Status of your data stream is Active, at which point it will be ready to accept data streams or a sequence of records.

Connect to an EC2 and Configure Live Temperature Feeds to the Data Stream

Schedule a Python script to send live temperature feeds using the Kinesis API to the data stream you created in the previous challenge.

  1. In the upper-left, click Services, enter EC2 into the search, and click EC2.

  2. In the left panel, under Instances click Instances. Note: You will see an instance named AnalyticsEngine in a Running state, which was created for you when you started this lab.

  3. Select the instance AnalyticsEngine, click Connect, ensure the EC2 Instance Connect tab is selected, then click Connect. Note: A new browser tab will open to a Linux command prompt.

  4. At the command prompt, enter the following two command, replacing and with the CLI CREDENTIALS values provided by this lab.

    export AWS_ACCESS_KEY_ID=''

    export AWS_SECRET_ACCESS_KEY=''

Note: For example, the first command would look something like

  export AWS_ACCESS_KEY_ID='AKIASY3GMJRF5PXADOMT'
  • Enter cat > sensorstream.py, and paste in this sensorstream.py source code, press enter, then press Ctrl+D. Note: This command creates a script you will next execute, and note there are other ways to do this, such as using vi.

  • Run the command python sensorstream.py

Note: This ingests live temperature feeds to your Kinesis Data Stream using the python kinesis connector API. If you get an error that ends with something similar to boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV4Handler'] Check your credentials, then there was an issue with task 4. Double-check things, and re-do that task.

This will start generating temperature feeds, and you will observe continuous sensor data including "iotValue".



Monitor Incoming Data to the Kinesis Data Stream

You generated live temperature feeds and connected them to your Kinesis Data Stream using the Kinesis API. You will monitor the incoming traffic being ingested to the Kinesis Data Stream and based on the increase in load you will increase the number of shards.

  1. Go back to the AWS Console browser tab.

  2. In the upper-left, click Services, enter Kinesis into the search, and click Kinesis.

  3. In the left-hand menu click Data streams, then click the RawStreamData link.

  4. At the RawStreamData page, if needed, click the Monitoring tab. Note: There are various Stream metrics which you can scroll down and see, such as Incoming data - sum (Bytes), Incoming data - sum (Count), Put record - sum (Bytes), Put record latency - average (Milliseconds), and Put record success - average (Percent).

  5. Hover over the Incoming data - sum (Bytes) panel, in its upper-right click the three vertical dots, then click on View in metrics.


figure
  •  In the new browser tab that opens to the Metric page, select a Number graph. Note: Wait if needed until IncomingRecords is above 1k, which in this scenario will indicate you need more resources to handle the streaming data. The following tasks will show you how to do this by increasing the number of shards.

figure
  • Go back to the browser tab open to the RawStreamData page, and click the Configuration tab.

  • Click on Edit, increase the Number of open shards to 2, then click Save changes. Note: This will handle a greater amount of streaming data.

  • After about a minute, you will see a panel saying Stream capacity was successfully updated for this data stream.