Azure Databricks #spark #pyspark #azuredatabricks #azure
In this video, I discussed how to use arrayType, array(), array_contains() functions in pyspark.
1. arrayType datatype in pyspark
2. array function in pyspark
3. array_contains function in pyspark
Create dataframe:
======================================================
from pyspark.sql.types import StringType, ArrayType,StructType,StructField
data = [
("James,,Smith",["ADF","Scala","Pyspark"],["Pyspark","ADF"],"OH","CA"),
("Michael,Rose,",["ADF","SQL","Pyspark"],["ADF","SQL"],"NY","NJ"),
("Robert,,Williams",["SQL","SSIS"],["SQL","SSIS"],"UT","NV")
]
schema = StructType([
StructField("name",StringType(),True),
StructField("skills",ArrayType(StringType()),True),
StructField("workprofile",ArrayType(StringType()),True),
StructField("currentState", StringType(), True),
StructField("previousState", StringType(), True)
])
df = spark.createDataFrame(data=data,schema=schema)
display(df)
-----------------------------------------------------------------------------------------------------------------------
from pyspark.sql.functions import *
df1=df.withColumn('States',array(df.currentState,df.previousState))
display(df1)
---------------------------------------------------------------------------------------------------------------------
df2=df.withColumn('array_contains',array_contains(df.skills,'ADF'))
display(df2)
============================================================
Learn PySpark, an interface for Apache Spark in Python. PySpark is often used for large-scale data processing and machine learning.
1. pyspark introduction | pyspark tutorial for beginners | pyspark tutorial for data engineers:
• 1. pyspark introduction | pyspark tut...
2. what is dataframe in pyspark | dataframe in azure databricks | pyspark tutorial for data engineer:
• 2. what is dataframe in pyspark | dat...
3. How to read write csv file in PySpark | Databricks Tutorial | pyspark tutorial for data engineer:
• 3. How to read write csv file in PySp...
4. Different types of write modes in Dataframe using PySpark | pyspark tutorial for data engineers:
• 4. Different types of write modes in ...
5. read data from parquet file in pyspark | write data to parquet file in pyspark:
• 5. read data from parquet file in pys...
6. datatypes in PySpark | pyspark data types | pyspark tutorial for beginners:
• 6. datatypes in PySpark | pyspark dat...
7. how to define the schema in pyspark | structtype & structfield in pyspark | Pyspark tutorial:
• 7. how to define the schema in pyspar...
8. how to read CSV file using PySpark | How to read csv file with schema option in pyspark:
• 8. how to read CSV file using PySpark...
9. read json file in pyspark | read nested json file in pyspark | read multiline json file:
• 9. read json file in pyspark | read n...
10. add, modify, rename and drop columns in dataframe | withcolumn and withcolumnrename in pyspark:
• 10. add, modify, rename and drop colu...
11. filter in pyspark | how to filter dataframe using like operator | like in pyspark:
• 11. filter in pyspark | how to filter...
12. startswith in pyspark | endswith in pyspark | contains in pyspark | pyspark tutorial:
• 12. startswith in pyspark | endswith ...
13. isin in pyspark and not isin in pyspark | in and not in in pyspark | pyspark tutorial:
• 13. isin in pyspark and not isin in p...
14. select in PySpark | alias in pyspark | azure Databricks #spark #pyspark #azuredatabricks #azure
• 14. select in PySpark | alias in pysp...
15. when in pyspark | otherwise in pyspark | alias in pyspark | case statement in pyspark:
• 15. when in pyspark | otherwise in py...
16. Null handling in pySpark DataFrame | isNull function in pyspark | isNotNull function in pyspark:
• 16. Null handling in pySpark DataFram...
17. fill() & fillna() functions in PySpark | how to replace null values in pyspark | Azure Databrick:
• 17. fill() & fillna() functions in Py...
18. GroupBy function in PySpark | agg function in pyspark | aggregate function in pyspark:
• 18. GroupBy function in PySpark | agg...
19. count function in pyspark | countDistinct function in pyspark | pyspark tutorial for beginners:
• 19. count function in pyspark | count...
20. orderBy in pyspark | sort in pyspark | difference between orderby and sort in pyspark:
• 20. orderBy in pyspark | sort in pysp...
21. distinct and dropduplicates in pyspark | how to remove duplicate in pyspark | pyspark tutorial:
• 21. distinct and dropduplicates in py...
Azure Databricks Tutorial Platlist:
• Azure Databricks Tutorial
Azure data factory tutorial playlist:
• Azure Data factory (adf)
ADF interview question & answer:
• adf interview questions and answers f...
Информация по комментариям в разработке