object CopybookParser extends Logging
The object contains generic function for the Copybook parser
- Alphabetic
- By Inheritance
- CopybookParser
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- type CopybookAST = Group
- case class CopybookLine(level: Int, name: String, lineNumber: Int, modifiers: Map[String, String]) extends Product with Serializable
- case class RecordBoundary(name: String, begin: Int, end: Int) extends Product with Serializable
- case class StatementLine(lineNumber: Int, text: String) extends Product with Serializable
- case class StatementTokens(lineNumber: Int, tokens: Array[String]) extends Product with Serializable
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
findCycleInAMap(m: Map[String, String]): List[String]
Finds a cycle in a parent-child relation map.
Finds a cycle in a parent-child relation map.
- m
A mapping from field name to its parent field name.
- returns
A list of fields in a cycle if there is one, an empty list otherwise
-
def
getAllSegmentRedefines(schema: CopybookAST): List[Group]
Given an AST of a copybook returns the list of all segment redefine GROUPs
Given an AST of a copybook returns the list of all segment redefine GROUPs
- schema
An AST as a set of copybook records
- returns
A list of segment redefine GROUPs
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getParentToChildrenMap(schema: CopybookAST): Map[String, Seq[Group]]
Given an AST of a copybook returns a map from segment redefines to their children
Given an AST of a copybook returns a map from segment redefines to their children
- schema
An AST as a set of copybook records
- returns
A map from segment redefines to their children
-
def
getRootSegmentAST(schema: CopybookAST): CopybookAST
Given an AST of a copybook returns a new AST that does not contain child segments
Given an AST of a copybook returns a new AST that does not contain child segments
- schema
An AST as a set of copybook records
- returns
A list of segment redefine GROUPs
-
def
getRootSegmentIds(segmentIdRedefineMap: Map[String, String], fieldParentMap: Map[String, String]): List[String]
Returns a a list of values of segment ids for the root segment.
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logger: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
parse(copyBookContents: String, dataEncoding: Encoding = EBCDIC, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, fillerNamingPolicy: FillerNamingPolicy = FillerNamingPolicy.SequenceNumbers, segmentRedefines: Seq[String] = Nil, fieldParentMap: Map[String, String] = HashMap[String, String](), stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, commentPolicy: CommentPolicy = CommentPolicy(), strictSignOverpunch: Boolean = true, improvedNullDetection: Boolean = false, decodeBinaryAsHex: Boolean = false, ebcdicCodePage: CodePage = new CodePageCommon, asciiCharset: Charset = StandardCharsets.US_ASCII, isUtf16BigEndian: Boolean = true, floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, nonTerminals: Seq[String] = Nil, occursHandlers: Map[String, Map[String, Int]] = Map(), debugFieldsPolicy: DebugFieldsPolicy = DebugFieldsPolicy.NoDebug, fieldCodePageMap: Map[String, String] = Map.empty[String, String]): Copybook
Tokenizes a Cobol Copybook contents and returns the AST.
Tokenizes a Cobol Copybook contents and returns the AST.
- copyBookContents
A string containing all lines of a copybook
- dataEncoding
Encoding of the data file (either ASCII/EBCDIC). The encoding of the copybook is expected to be ASCII.
- dropGroupFillers
Drop groups marked as fillers from the output AST
- dropValueFillers
Drop primitive fields marked as fillers from the output AST
- fillerNamingPolicy
Specifies a naming policy for fillers
- segmentRedefines
A list of redefined fields that correspond to various segments. This needs to be specified for automatically resolving segment redefines.
- fieldParentMap
A segment fields parent mapping
- stringTrimmingPolicy
Specifies if and how strings should be trimmed when parsed
- commentPolicy
Specifies a policy for comments truncation inside a copybook
- strictSignOverpunch
If true sign overpunching is not allowed for unsigned numbers
- improvedNullDetection
If true, string values that contain only zero bytes (0x0) will be considered null.
- ebcdicCodePage
A code page for EBCDIC encoded data
- asciiCharset
A charset for ASCII encoded data
- isUtf16BigEndian
If true UTF-16 strings are considered big-endian.
- floatingPointFormat
A format of floating-point numbers (IBM/IEEE754)
- nonTerminals
A list of non-terminals that should be extracted as strings
- debugFieldsPolicy
Specifies if debugging fields need to be added and what should they contain (false, hex, raw).
- returns
Seq[Group] where a group is a record inside the copybook
-
def
parseSimple(copyBookContents: String, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, commentPolicy: CommentPolicy = CommentPolicy(), dropFillersFromAst: Boolean = false): Copybook
Tokenizes a Cobol Copybook contents and returns the AST.
Tokenizes a Cobol Copybook contents and returns the AST.
This method accepts arguments that affect only structure of the output AST.
- copyBookContents
A string containing all lines of a copybook
- dropGroupFillers
Drop GROUPs marked as fillers from the output AST (the name of this parameter is retained for compatibility, fields won't be actually removed from the AST unless dropFillersFromAst is set to true). When dropGroupFillers is set to true, FILLER fields will retain their names, and 'isFiller() = true' for FILLER GROUPs. When dropGroupFillers is set to false, FILLER fields will be renamed to 'FILLER_1, FILLER_2, ...' to retain uniqueness of names in the output schema.
- dropValueFillers
Drop primitive fields marked as fillers from the output AST (the name of this parameter is retained for compatibility, fields won't be actually removed from the AST unless dropFillersFromAst is set to true). When dropValueFillers is set to true, FILLER fields will retain their names, and 'isFiller() = true' for FILLER primitive fields. When dropValueFillers is set to false, FILLER fields will be renamed to 'FILLER_P1, FILLER_P2, ...' to retain uniqueness of names in the output schema.
- commentPolicy
Specifies a policy for comments truncation inside a copybook
- dropFillersFromAst
If true, fillers are going to be dropped from AST according to dropGroupFillers and dropValueFillers. If false, fillers will remain in the AST, but still can be recognizable by 'isFiller()' method.
- returns
Seq[Group] where a group is a record inside the copybook
-
def
parseTree(enc: Encoding, copyBookContents: String, dropGroupFillers: Boolean, dropValueFillers: Boolean, fillerNamingPolicy: FillerNamingPolicy, segmentRedefines: Seq[String], fieldParentMap: Map[String, String], stringTrimmingPolicy: StringTrimmingPolicy, commentPolicy: CommentPolicy, strictSignOverpunch: Boolean, improvedNullDetection: Boolean, decodeBinaryAsHex: Boolean, ebcdicCodePage: CodePage, asciiCharset: Charset, isUtf16BigEndian: Boolean, floatingPointFormat: FloatingPointFormat, nonTerminals: Seq[String], occursHandlers: Map[String, Map[String, Int]], debugFieldsPolicy: DebugFieldsPolicy, fieldCodePageMap: Map[String, String]): Copybook
Tokenizes a Cobol Copybook contents and returns the AST.
Tokenizes a Cobol Copybook contents and returns the AST.
- enc
Encoding of the data file (either ASCII/EBCDIC). The encoding of the copybook is expected to be ASCII.
- copyBookContents
A string containing all lines of a copybook
- dropGroupFillers
Drop groups marked as fillers from the output AST
- dropValueFillers
Drop primitive fields marked as fillers from the output AST
- fillerNamingPolicy
Specifies a naming policy for fillers
- segmentRedefines
A list of redefined fields that correspond to various segments. This needs to be specified for automatically resolving segment redefines.
- fieldParentMap
A segment fields parent mapping
- stringTrimmingPolicy
Specifies if and how strings should be trimmed when parsed
- commentPolicy
Specifies a policy for comments truncation inside a copybook
- improvedNullDetection
If true, string values that contain only zero bytes (0x0) will be considered null.
- ebcdicCodePage
A code page for EBCDIC encoded data
- asciiCharset
A charset for ASCII encoded data
- isUtf16BigEndian
If true UTF-16 strings are considered big-endian.
- floatingPointFormat
A format of floating-point numbers (IBM/IEEE754)
- nonTerminals
A list of non-terminals that should be extracted as strings
- debugFieldsPolicy
Specifies if debugging fields need to be added and what should they contain (false, hex, raw).
- returns
Seq[Group] where a group is a record inside the copybook
- Annotations
- @throws( classOf[SyntaxErrorException] )
-
def
parseTree(copyBookContents: String, dropGroupFillers: Boolean = false, dropValueFillers: Boolean = true, fillerNamingPolicy: FillerNamingPolicy = FillerNamingPolicy.SequenceNumbers, segmentRedefines: Seq[String] = Nil, fieldParentMap: Map[String, String] = HashMap[String, String](), stringTrimmingPolicy: StringTrimmingPolicy = StringTrimmingPolicy.TrimBoth, commentPolicy: CommentPolicy = CommentPolicy(), strictSignOverpunch: Boolean = true, improvedNullDetection: Boolean = false, decodeBinaryAsHex: Boolean = false, ebcdicCodePage: CodePage = new CodePageCommon, asciiCharset: Charset = StandardCharsets.US_ASCII, isUtf16BigEndian: Boolean = true, floatingPointFormat: FloatingPointFormat = FloatingPointFormat.IBM, nonTerminals: Seq[String] = Nil, occursHandlers: Map[String, Map[String, Int]] = Map(), debugFieldsPolicy: DebugFieldsPolicy = DebugFieldsPolicy.NoDebug, fieldCodePageMap: Map[String, String] = Map.empty[String, String]): Copybook
Tokenizes a Cobol Copybook contents and returns the AST.
Tokenizes a Cobol Copybook contents and returns the AST.
- copyBookContents
A string containing all lines of a copybook
- dropGroupFillers
Drop groups marked as fillers from the output AST
- dropValueFillers
Drop primitive fields marked as fillers from the output AST
- fillerNamingPolicy
Specifies a naming policy for fillers
- segmentRedefines
A list of redefined fields that correspond to various segments. This needs to be specified for automatically
- fieldParentMap
A segment fields parent mapping
- stringTrimmingPolicy
Specifies if and how strings should be trimmed when parsed
- commentPolicy
Specifies a policy for comments truncation inside a copybook
- strictSignOverpunch
If true sign overpunching is not allowed for unsigned numbers
- improvedNullDetection
If true, string values that contain only zero bytes (0x0) will be considered null.
- ebcdicCodePage
A code page for EBCDIC encoded data
- asciiCharset
A charset for ASCII encoded data
- isUtf16BigEndian
If true UTF-16 strings are considered big-endian.
- floatingPointFormat
A format of floating-point numbers (IBM/IEEE754)
- nonTerminals
A list of non-terminals that should be extracted as strings
- debugFieldsPolicy
Specifies if debugging fields need to be added and what should they contain (false, hex, raw).
- returns
Seq[Group] where a group is a record inside the copybook
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transformIdentifier(identifier: String): String
Transforms the Cobol identifiers to be useful in Spark context.
Transforms the Cobol identifiers to be useful in Spark context. Removes characters an identifier cannot contain.
-
def
transformIdentifierMap(identifierMap: Map[String, String]): Map[String, String]
Transforms all identifiers in a map to be useful in Spark context.
Transforms all identifiers in a map to be useful in Spark context. Removes characters an identifier cannot contain.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()