Slect

Wadson Guimatsa

A blog about Data-Intensive Applications
Written by Wadson Guimatsa

.NET For Apache Spark - Writing Data.

October 15, 2019

By the end of this post, you should know how to read and write from the following Spark's core data sources:
 1. Parquet.
 2.JDBC/ODBC.
 3. JSON
 4. CSV.
 5. Text

Apache Spark - Architecture Part 1: Driver and Executors

September 24, 2019

By the end of this post you should know how to do the following:
 1. Define the different cluster management systems Spark can run on.
 2. List the major components of Apache Hadoop YARN
 2. Explain the relationship between all major components of YARN
 3. Explain the functionalities of the Driver an Executors in Apache Spark.
 4. Explain how Spark uses cluster resource to process Big-Data

.Net for Apache Spark - DataFrame, Part 4: Complex Types

September 01, 2019

By the end of this post you should know how to do the following:
 1. List the different complex types available in Apache Spark
 2. How to query a DataFrame that has complex datatype

.Net for Apache Spark - DataFrame, Part 3: Basic Transformations

August 25, 2019

This post introduces you to the fundamentals of working with DataFrames.
By the end of this post you should know how to do the following with a Dataframe:
 1. Select specific columns.
 2. Filter rows.
 3. Add a Column.
 4. Remove a Column.
 5. Rename a Column.
 6. Change the Datatype of a Column.

.Net for Apache Spark - DataFrame, Part 2: Transformations

August 18, 2019

In Apache Spark, business logic is expressed using Transformations.
This blog post is a theoretical introduction to DataFrame Transformation

.Net for Apache Spark - DataFrame, Part 1: Fundamentals

August 11, 2019

By the end you should know how to do the following:
 1. Describe what a DataFrame is in Apache Spark.
 2. Explain what row and columns are.

.Net for Apache Spark - Loading Data, Part 3 : RDBMS

August 04, 2019

By the end of this post, you should know how to do the following:
 1. How to connect to SQL databases using JDBC .
 2. Specify a query that will be used to read data into Apache Spark
 3. Set a custom schema to use for reading data from a JDBC data source

.Net for Apache Spark - Loading Data, Part 2 : JSON files

July 30, 2019

By the end of this post, you should know how to do the following:
 1. Read JSON files.
 2. Configure options for the JSON format.
 3. Specify a complex DDL-formatted schema for a JSON file.
 4. Handle file with corrupted records

.Net for Apache Spark - Loading Data, Part 1 : CSV files

July 25, 2019

By the end of this post, you should know how to do the following:
 1. Read csv file using format() and load().
 2. Configure options for the csv format.
 3. Specify a DDL-formatted schema for a csv file.

.NET for Apache Spark - Getting Started

June 01, 2019

Apache Spark is an open source analytics engine for big data and machine learning.
Application for Spark can be written in Scala, Python, R, SQL and now also for the .NET Core Framework using C# and F#.
In this video, I will show you how to install Apache Spark on Windows and write an application using .NET Core and C# to test the installation

SQL Server - Full text search with the Contains predicate

March 06, 2019

By the end of this video, you will be able to write queries to perform the following types of full-text searches:
1. Simple Term: find specific words 05:17
2. Prefix Term: query for words or phrases starting with a specific text 24:33
3. Generation Term: search for multiple forms of a specific word17:55
4. Synonymous Term: find words with similar meaning 28:58

SQL Server - Introduction to full-text search

February 01, 2019

Searching is the most fundamental thing we do with a database, most of the time we know the shape of what we are looking for, i.e an ID or specific text, but what if you can't state exactly what you are looking for?