pyspark.sql.datasource.DataSourceStreamReader.read#
- abstract DataSourceStreamReader.read(partition)[source]#
- Generates data for a given partition and returns an iterator of tuples or rows. - This method is invoked once per partition to read the data. Implementing this method is required for stream reader. You can initialize any non-serializable resources required for reading data from the data source within this method. - Parameters
- partitionInputPartition
- The partition to read. It must be one of the partition values returned by - DataSourceStreamReader.partitions().
 
- partition
- Returns
- iterator of tuples or Rows
- An iterator of tuples or rows. Each tuple or row will be converted to a row in the final DataFrame. 
 
- iterator of tuples or 
 - Notes - This method is static and stateless. You shouldn’t access mutable class member or keep in memory state between different invocations of read().