pyspark.sql.functions.mask¶
- 
pyspark.sql.functions.mask(col: ColumnOrName, upperChar: Optional[ColumnOrName] = None, lowerChar: Optional[ColumnOrName] = None, digitChar: Optional[ColumnOrName] = None, otherChar: Optional[ColumnOrName] = None) → pyspark.sql.column.Column[source]¶
- Masks the given string value. This can be useful for creating copies of tables with sensitive information removed. - New in version 3.5.0. - Parameters
- col: :class:`~pyspark.sql.Column` or str
- target column to compute on. 
- upperChar: :class:`~pyspark.sql.Column` or str
- character to replace upper-case characters with. Specify NULL to retain original character. 
- lowerChar: :class:`~pyspark.sql.Column` or str
- character to replace lower-case characters with. Specify NULL to retain original character. 
- digitChar: :class:`~pyspark.sql.Column` or str
- character to replace digit characters with. Specify NULL to retain original character. 
- otherChar: :class:`~pyspark.sql.Column` or str
- character to replace all other characters with. Specify NULL to retain original character. 
 
- Returns
 - Examples - >>> df = spark.createDataFrame([("AbCD123-@$#",), ("abcd-EFGH-8765-4321",)], ['data']) >>> df.select(mask(df.data).alias('r')).collect() [Row(r='XxXXnnn-@$#'), Row(r='xxxx-XXXX-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y')).alias('r')).collect() [Row(r='YxYYnnn-@$#'), Row(r='xxxx-YYYY-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y'), lit('y')).alias('r')).collect() [Row(r='YyYYnnn-@$#'), Row(r='yyyy-YYYY-nnnn-nnnn')] >>> df.select(mask(df.data, lit('Y'), lit('y'), lit('d')).alias('r')).collect() [Row(r='YyYYddd-@$#'), Row(r='yyyy-YYYY-dddd-dddd')] >>> df.select(mask(df.data, lit('Y'), lit('y'), lit('d'), lit('*')).alias('r')).collect() [Row(r='YyYYddd****'), Row(r='yyyy*YYYY*dddd*dddd')]