An extension to FsDataWriter that writes in Parquet format in the form of This implementation allows users to specify the CodecFactory to use through the configuration property writer.codec.type. By default, the deflate codec is used.



For more info, see ParquetHdfsDataWriter and ParquetDataWriterBuilder


Key Description Default Value Required The page size threshold. 1048576 No The block size threshold for the dictionary pages. 134217728 No
writer.parquet.dictionary To turn dictionary encoding on. Parquet has a dictionary encoding for data with a small number of unique values ( < 10^5 ) that aids in significant compression and boosts processing speed. true No
writer.parquet.validate To turn on validation using the schema. This validation is done by ParquetWriter not by Gobblin. false No
writer.parquet.version Version of parquet writer to use. Available versions are v1 and v2. v1 No