Class AvroUtils


  • public class AvroUtils
    extends Object
    Utils for handling Avro records
    • Method Detail

      • getPinotSchemaFromAvroSchema

        public static Schema getPinotSchemaFromAvroSchema​(org.apache.avro.Schema avroSchema,
                                                          @Nullable
                                                          Map<String,​FieldSpec.FieldType> fieldTypeMap,
                                                          @Nullable
                                                          TimeUnit timeUnit)
        Given an Avro schema, map from column to field type and time unit, return the equivalent Pinot schema.
        Parameters:
        avroSchema - Avro schema
        fieldTypeMap - Map from column to field type
        timeUnit - Time unit
        Returns:
        Pinot schema
      • getPinotSchemaFromAvroSchemaWithComplexTypeHandling

        public static Schema getPinotSchemaFromAvroSchemaWithComplexTypeHandling​(org.apache.avro.Schema avroSchema,
                                                                                 @Nullable
                                                                                 Map<String,​FieldSpec.FieldType> fieldTypeMap,
                                                                                 @Nullable
                                                                                 TimeUnit timeUnit,
                                                                                 List<String> fieldsToUnnest,
                                                                                 String delimiter,
                                                                                 ComplexTypeConfig.CollectionNotUnnestedToJson collectionNotUnnestedToJson)
        Given an Avro schema, flatten/unnest the complex types based on the config, and then map from column to field type and time unit, return the equivalent Pinot schema.
        Parameters:
        avroSchema - Avro schema
        fieldTypeMap - Map from column to field type
        timeUnit - Time unit
        fieldsToUnnest - the fields to unnest
        delimiter - the delimiter to separate components in nested structure
        collectionNotUnnestedToJson - the mode of converting collection to JSON
        Returns:
        Pinot schema
      • getPinotSchemaFromAvroDataFile

        public static Schema getPinotSchemaFromAvroDataFile​(File avroDataFile,
                                                            @Nullable
                                                            Map<String,​FieldSpec.FieldType> fieldTypeMap,
                                                            @Nullable
                                                            TimeUnit timeUnit)
                                                     throws IOException
        Given an Avro data file, map from column to field type and time unit, return the equivalent Pinot schema.
        Parameters:
        avroDataFile - Avro data file
        fieldTypeMap - Map from column to field type
        timeUnit - Time unit
        Returns:
        Pinot schema
        Throws:
        IOException
      • getPinotSchemaFromAvroDataFile

        public static Schema getPinotSchemaFromAvroDataFile​(File avroDataFile)
                                                     throws IOException
        Given an Avro data file, count all columns as dimension and return the equivalent Pinot schema.

        Should be used for testing purpose only.

        Parameters:
        avroDataFile - Avro data file
        Returns:
        Pinot schema
        Throws:
        IOException
      • getPinotSchemaFromAvroSchemaFile

        public static Schema getPinotSchemaFromAvroSchemaFile​(File avroSchemaFile,
                                                              @Nullable
                                                              Map<String,​FieldSpec.FieldType> fieldTypeMap,
                                                              @Nullable
                                                              TimeUnit timeUnit,
                                                              boolean complexType,
                                                              List<String> fieldsToUnnest,
                                                              String delimiter,
                                                              ComplexTypeConfig.CollectionNotUnnestedToJson collectionNotUnnestedToJson)
                                                       throws IOException
        Given an Avro schema file, map from column to field type and time unit, return the equivalent Pinot schema.
        Parameters:
        avroSchemaFile - Avro schema file
        fieldTypeMap - Map from column to field type
        timeUnit - Time unit
        complexType - if allows complex-type handling
        fieldsToUnnest - the fields to unnest
        delimiter - the delimiter separating components in nested structure
        collectionNotUnnestedToJson - to mode of converting collection to JSON string
        Returns:
        Pinot schema
        Throws:
        IOException
      • getAvroSchemaFromPinotSchema

        public static org.apache.avro.Schema getAvroSchemaFromPinotSchema​(Schema pinotSchema)
        Helper method to build Avro schema from Pinot schema.
        Parameters:
        pinotSchema - Pinot schema.
        Returns:
        Avro schema.
      • getAvroReader

        public static org.apache.avro.file.DataFileStream<org.apache.avro.generic.GenericRecord> getAvroReader​(File avroFile)
                                                                                                        throws IOException
        Get the Avro file reader for the given file.
        Throws:
        IOException
      • isSingleValueField

        public static boolean isSingleValueField​(org.apache.avro.Schema.Field field)
        Return whether the Avro field is a single-value field.
      • extractFieldDataType

        public static FieldSpec.DataType extractFieldDataType​(org.apache.avro.Schema.Field field)
        Extract the data type stored in Pinot for the given Avro field.