Packages

o

za.co.absa.cobrix.cobol.parser

CopybookParser

object CopybookParser extends Logging

The object contains generic function for the Copybook parser

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CopybookParser
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type CopybookAST = Group
  2. case class CopybookLine(level: Int, name: String, lineNumber: Int, modifiers: Map[String, String]) extends Product with Serializable
  3. case class RecordBoundary(name: String, begin: Int, end: Int) extends Product with Serializable
  4. case class StatementLine(lineNumber: Int, text: String) extends Product with Serializable
  5. case class StatementTokens(lineNumber: Int, tokens: Array[String]) extends Product with Serializable

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. def findCycleInAMap(m: Map[String, String]): List[String]

    Finds a cycle in a parent-child relation map.

    Finds a cycle in a parent-child relation map.

    m

    A mapping from field name to its parent field name.

    returns

    A list of fields in a cycle if there is one, an empty list otherwise

  10. def getAllSegmentRedefines(schema: CopybookAST): List[Group]

    Given an AST of a copybook returns the list of all segment redefine GROUPs

    Given an AST of a copybook returns the list of all segment redefine GROUPs

    schema

    An AST as a set of copybook records

    returns

    A list of segment redefine GROUPs

  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def getParentToChildrenMap(schema: CopybookAST): Map[String, Seq[Group]]

    Given an AST of a copybook returns a map from segment redefines to their children

    Given an AST of a copybook returns a map from segment redefines to their children

    schema

    An AST as a set of copybook records

    returns

    A map from segment redefines to their children

  13. def getRootSegmentAST(schema: CopybookAST): CopybookAST

    Given an AST of a copybook returns a new AST that does not contain child segments

    Given an AST of a copybook returns a new AST that does not contain child segments

    schema

    An AST as a set of copybook records

    returns

    A list of segment redefine GROUPs

  14. def getRootSegmentIds(segmentIdRedefineMap: Map[String, String], fieldParentMap: Map[String, String]): List[String]

    Returns a a list of values of segment ids for the root segment.

  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  18. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  19. def logger: Logger
    Attributes
    protected
    Definition Classes
    Logging
  20. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  22. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  23. def parse(copyBookContents: String, dataEncoding: Encoding = EBCDIC, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, fillerNamingPolicy: FillerNamingPolicy = FillerNamingPolicy.SequenceNumbers, segmentRedefines: Seq[String] = Nil, fieldParentMap: Map[String, String] = HashMap[String, String](), stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, commentPolicy: CommentPolicy = CommentPolicy(), strictSignOverpunch: Boolean = true, improvedNullDetection: Boolean = false, decodeBinaryAsHex: Boolean = false, ebcdicCodePage: CodePage = new CodePageCommon, asciiCharset: Charset = StandardCharsets.US_ASCII, isUtf16BigEndian: Boolean = true, floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, nonTerminals: Seq[String] = Nil, occursHandlers: Map[String, Map[String, Int]] = Map(), debugFieldsPolicy: DebugFieldsPolicy = DebugFieldsPolicy.NoDebug, fieldCodePageMap: Map[String, String] = Map.empty[String, String]): Copybook

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    copyBookContents

    A string containing all lines of a copybook

    dataEncoding

    Encoding of the data file (either ASCII/EBCDIC). The encoding of the copybook is expected to be ASCII.

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    fillerNamingPolicy

    Specifies a naming policy for fillers

    segmentRedefines

    A list of redefined fields that correspond to various segments. This needs to be specified for automatically resolving segment redefines.

    fieldParentMap

    A segment fields parent mapping

    stringTrimmingPolicy

    Specifies if and how strings should be trimmed when parsed

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    strictSignOverpunch

    If true sign overpunching is not allowed for unsigned numbers

    improvedNullDetection

    If true, string values that contain only zero bytes (0x0) will be considered null.

    ebcdicCodePage

    A code page for EBCDIC encoded data

    asciiCharset

    A charset for ASCII encoded data

    isUtf16BigEndian

    If true UTF-16 strings are considered big-endian.

    floatingPointFormat

    A format of floating-point numbers (IBM/IEEE754)

    nonTerminals

    A list of non-terminals that should be extracted as strings

    debugFieldsPolicy

    Specifies if debugging fields need to be added and what should they contain (false, hex, raw).

    returns

    Seq[Group] where a group is a record inside the copybook

  24. def parseSimple(copyBookContents: String, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, commentPolicy: CommentPolicy = CommentPolicy(), dropFillersFromAst: Boolean = false): Copybook

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    This method accepts arguments that affect only structure of the output AST.

    copyBookContents

    A string containing all lines of a copybook

    dropGroupFillers

    Drop GROUPs marked as fillers from the output AST (the name of this parameter is retained for compatibility, fields won't be actually removed from the AST unless dropFillersFromAst is set to true). When dropGroupFillers is set to true, FILLER fields will retain their names, and 'isFiller() = true' for FILLER GROUPs. When dropGroupFillers is set to false, FILLER fields will be renamed to 'FILLER_1, FILLER_2, ...' to retain uniqueness of names in the output schema.

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST (the name of this parameter is retained for compatibility, fields won't be actually removed from the AST unless dropFillersFromAst is set to true). When dropValueFillers is set to true, FILLER fields will retain their names, and 'isFiller() = true' for FILLER primitive fields. When dropValueFillers is set to false, FILLER fields will be renamed to 'FILLER_P1, FILLER_P2, ...' to retain uniqueness of names in the output schema.

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    dropFillersFromAst

    If true, fillers are going to be dropped from AST according to dropGroupFillers and dropValueFillers. If false, fillers will remain in the AST, but still can be recognizable by 'isFiller()' method.

    returns

    Seq[Group] where a group is a record inside the copybook

  25. def parseTree(enc: Encoding, copyBookContents: String, dropGroupFillers: Boolean, dropValueFillers: Boolean, fillerNamingPolicy: FillerNamingPolicy, segmentRedefines: Seq[String], fieldParentMap: Map[String, String], stringTrimmingPolicy: StringTrimmingPolicy, commentPolicy: CommentPolicy, strictSignOverpunch: Boolean, improvedNullDetection: Boolean, decodeBinaryAsHex: Boolean, ebcdicCodePage: CodePage, asciiCharset: Charset, isUtf16BigEndian: Boolean, floatingPointFormat: FloatingPointFormat, nonTerminals: Seq[String], occursHandlers: Map[String, Map[String, Int]], debugFieldsPolicy: DebugFieldsPolicy, fieldCodePageMap: Map[String, String]): Copybook

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    enc

    Encoding of the data file (either ASCII/EBCDIC). The encoding of the copybook is expected to be ASCII.

    copyBookContents

    A string containing all lines of a copybook

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    fillerNamingPolicy

    Specifies a naming policy for fillers

    segmentRedefines

    A list of redefined fields that correspond to various segments. This needs to be specified for automatically resolving segment redefines.

    fieldParentMap

    A segment fields parent mapping

    stringTrimmingPolicy

    Specifies if and how strings should be trimmed when parsed

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    improvedNullDetection

    If true, string values that contain only zero bytes (0x0) will be considered null.

    ebcdicCodePage

    A code page for EBCDIC encoded data

    asciiCharset

    A charset for ASCII encoded data

    isUtf16BigEndian

    If true UTF-16 strings are considered big-endian.

    floatingPointFormat

    A format of floating-point numbers (IBM/IEEE754)

    nonTerminals

    A list of non-terminals that should be extracted as strings

    debugFieldsPolicy

    Specifies if debugging fields need to be added and what should they contain (false, hex, raw).

    returns

    Seq[Group] where a group is a record inside the copybook

    Annotations
    @throws( classOf[SyntaxErrorException] )
  26. def parseTree(copyBookContents: String, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, fillerNamingPolicy: FillerNamingPolicy = FillerNamingPolicy.SequenceNumbers, segmentRedefines: Seq[String] = Nil, fieldParentMap: Map[String, String] = HashMap[String, String](), stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, commentPolicy: CommentPolicy = CommentPolicy(), strictSignOverpunch: Boolean = true, improvedNullDetection: Boolean = false, decodeBinaryAsHex: Boolean = false, ebcdicCodePage: CodePage = new CodePageCommon, asciiCharset: Charset = StandardCharsets.US_ASCII, isUtf16BigEndian: Boolean = true, floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, nonTerminals: Seq[String] = Nil, occursHandlers: Map[String, Map[String, Int]] = Map(), debugFieldsPolicy: DebugFieldsPolicy = DebugFieldsPolicy.NoDebug, fieldCodePageMap: Map[String, String] = Map.empty[String, String]): Copybook

    Tokenizes a Cobol Copybook contents and returns the AST.

    Tokenizes a Cobol Copybook contents and returns the AST.

    copyBookContents

    A string containing all lines of a copybook

    dropGroupFillers

    Drop groups marked as fillers from the output AST

    dropValueFillers

    Drop primitive fields marked as fillers from the output AST

    fillerNamingPolicy

    Specifies a naming policy for fillers

    segmentRedefines

    A list of redefined fields that correspond to various segments. This needs to be specified for automatically

    fieldParentMap

    A segment fields parent mapping

    stringTrimmingPolicy

    Specifies if and how strings should be trimmed when parsed

    commentPolicy

    Specifies a policy for comments truncation inside a copybook

    strictSignOverpunch

    If true sign overpunching is not allowed for unsigned numbers

    improvedNullDetection

    If true, string values that contain only zero bytes (0x0) will be considered null.

    ebcdicCodePage

    A code page for EBCDIC encoded data

    asciiCharset

    A charset for ASCII encoded data

    isUtf16BigEndian

    If true UTF-16 strings are considered big-endian.

    floatingPointFormat

    A format of floating-point numbers (IBM/IEEE754)

    nonTerminals

    A list of non-terminals that should be extracted as strings

    debugFieldsPolicy

    Specifies if debugging fields need to be added and what should they contain (false, hex, raw).

    returns

    Seq[Group] where a group is a record inside the copybook

  27. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  28. def toString(): String
    Definition Classes
    AnyRef → Any
  29. def transformIdentifier(identifier: String): String

    Transforms the Cobol identifiers to be useful in Spark context.

    Transforms the Cobol identifiers to be useful in Spark context. Removes characters an identifier cannot contain.

  30. def transformIdentifierMap(identifierMap: Map[String, String]): Map[String, String]

    Transforms all identifiers in a map to be useful in Spark context.

    Transforms all identifiers in a map to be useful in Spark context. Removes characters an identifier cannot contain.

  31. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  33. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped