pyspark.sql.SparkSession¶
-
class
pyspark.sql.SparkSession(sparkContext: pyspark.context.SparkContext, jsparkSession: Optional[py4j.java_gateway.JavaObject] = None, options: Dict[str, Any] = {})[source]¶ The entry point to programming Spark with the Dataset and DataFrame API.
A SparkSession can be used to create
DataFrame, registerDataFrameas tables, execute SQL over tables, cache tables, and read parquet files. To create aSparkSession, use the following builder pattern:Changed in version 3.4.0: Supports Spark Connect.
Examples
Create a Spark session.
>>> spark = ( ... SparkSession.builder ... .master("local") ... .appName("Word Count") ... .config("spark.some.config.option", "some-value") ... .getOrCreate() ... )
Create a Spark session with Spark Connect.
>>> spark = ( ... SparkSession.builder ... .remote("sc://localhost") ... .appName("Word Count") ... .config("spark.some.config.option", "some-value") ... .getOrCreate() ... )
Methods
createDataFrame(data[, schema, …])Creates a
DataFramefrom anRDD, a list, apandas.DataFrameor anumpy.ndarray.Returns the active
SparkSessionfor the current thread, returned by the builderReturns a new
SparkSessionas new session, that has separate SQLConf, registered temporary views and UDFs, but sharedSparkContextand table cache.range(start[, end, step, numPartitions])Create a
DataFramewith singlepyspark.sql.types.LongTypecolumn namedid, containing elements in a range fromstarttoend(exclusive) with step valuestep.sql(sqlQuery[, args])Returns a
DataFramerepresenting the result of the given query.stop()Stop the underlying
SparkContext.table(tableName)Returns the specified table as a
DataFrame.Attributes
Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.
Runtime configuration interface for Spark.
Returns a
DataFrameReaderthat can be used to read data in as aDataFrame.Returns a
DataStreamReaderthat can be used to read data streams as a streamingDataFrame.Returns the underlying
SparkContext.Returns a
StreamingQueryManagerthat allows managing all theStreamingQueryinstances active on this context.Returns a
UDFRegistrationfor UDF registration.The version of Spark on which this application is running.