Spark Streaming with Python under 12 minutes

Описание к видео Spark Streaming with Python under 12 minutes

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams.

Chapters
0:00 Intro
0:35 What streaming pipelines? Spark Streaming
1:39 Navigating through spark streaming documentation
2:49 Real-life spark streaming example Architecture (Twitter and Pyspark Streaming)
3:40 Setting up of cloud environment for spark streaming
4:20 Coding TweetsListener.py using tweepy
7:00 Coding PySpark Streaming Pipeline
9:12 Running the Spark Streaming pipeline
11:40 Outro




Links:
Repo Link: https://github.com/syalanuj/youtube/t...
Spark Documentation: https://spark.apache.org/docs/latest/
Spark Streaming DS Streams: https://spark.apache.org/docs/latest/...
Spark Streaming Structured Streaming: https://spark.apache.org/docs/latest/...
Medium Blog Post link:   / spark-streaming-with-python  
My blog post link: https://anujsyal.com/spark-streaming-...

Data can be ingested from many sources like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window.

Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data. DStreams can be created either from input data streams from sources such as Kafka, and Kinesis, or by applying high-level operations on other DStreams. Internally, a DStream is represented as a sequence of RDDs.

Spark has full integration guide to connect to different data sources
Documentation
Two options-
Spark Streaming-
Structured streaming- uses spark session, does spark streaming based on top of spark SQL engine - should be goto


FOLLOW ME ON
MEDIUM:   / syal.anuj​  
INSTAGRAM:   / ​  
TWITTER:   / explore​  
GITHUB: https://github.com/syalanuj​
WEBSITE: https://anujsyal.com


#pyspark #streaming #realtimepipelines #python #data #spark #sparksql #sparkstreaming #dataengineering #datapipelines #datastreaming #pyspark

Комментарии

Информация по комментариям в разработке