A blog about Data-Intensive Applications
Written by Wadson Guimatsa
By the end of this post, you should know how to read and write from the following Spark's core data sources:
1. Parquet.
2.JDBC/ODBC.
3. JSON
4. CSV.
5. Text
By the end of this post you should know how to do the following:
1. Define the different cluster management systems Spark can run on.
2. List the major components of Apache Hadoop YARN
2. Explain the relationship between all major components of YARN
3. Explain the functionalities of the Driver an Executors in Apache Spark.
4. Explain how Spark uses cluster resource to process Big-Data
By the end of this post you should know how to do the following:
1. List the different complex types available in Apache Spark
2. How to query a DataFrame that has complex datatype
This post introduces you to the fundamentals of working with DataFrames.
By the end of this post you should know how to do the following with a Dataframe:
1. Select specific columns.
2. Filter rows.
3. Add a Column.
4. Remove a Column.
5. Rename a Column.
6. Change the Datatype of a Column.
In Apache Spark, business logic is expressed using Transformations.
This blog post is a theoretical introduction to DataFrame Transformation
By the end you should know how to do the following:
1. Describe what a DataFrame is in Apache Spark.
2. Explain what row and columns are.
By the end of this post, you should know how to do the following:
1. How to connect to SQL databases using JDBC .
2. Specify a query that will be used to read data into Apache Spark
3. Set a custom schema to use for reading data from a JDBC data source
By the end of this post, you should know how to do the following:
1. Read JSON files.
2. Configure options for the JSON format.
3. Specify a complex DDL-formatted schema for a JSON file.
4. Handle file with corrupted records
By the end of this post, you should know how to do the following:
1. Read csv file using format() and load().
2. Configure options for the csv format.
3. Specify a DDL-formatted schema for a csv file.
Apache Spark is an open source analytics engine for big data and machine learning. Application for Spark can be written in Scala, Python, R, SQL and now also for the .NET Core Framework using C# and F#. In this video, I will show you how to install Apache Spark on Windows and write an application using .NET Core and C# to test the installation
By the end of this video, you will be able to write queries to perform the following types of full-text searches:1. Simple Term: find specific words 05:172. Prefix Term: query for words or phrases starting with a specific text 24:333. Generation Term: search for multiple forms of a specific word17:554. Synonymous Term: find words with similar meaning 28:58
Searching is the most fundamental thing we do with a database, most of the time we know the shape of what we are looking for, i.e an ID or specific text, but what if you can't state exactly what you are looking for?