Programming language profile and parser class for Keyword Oriented Syntax.
Provides a data structure for describing keyword based languages that allows a naive processor to assign classifications to the tokens contained by a string.
Profile instances have limited information regarding languages and Parser instances should have naive processing algorithms. This intentional deficiency is a product of the goal to keep Keywords based interpretations trivial to implement and their parameter set small so that the the profile may be quickly and easily defined by users without requiring volumes of documentation to be consumed.
Parser Process Methods
Parser instances have two high-level methods for producing tokens: Parser.process_lines and Parser.process_document. The former resets the context provided to Parser.delimit for each line, and the latter maintains it throughout the lifetime of the generator.
While Parser.process_document appears desireable in many cases, the limitations of Profile instances may make it reasonable to choose Parser.process_lines in order to avoid the effects of an inaccruate profile or a language that maintains ambiguities with respect to the parser's capabilities.
Engineering
Essentially, this is a lexer whose tokens are defined by Profile instances. The language types that are matches for applications are usually keyword based and leverage whitespace for isolation of fields.
typing
itertools0
functools0
string0
Tokens0
Tokens = typing.Iterable[typing.Tuple[str,str,str]]
Profile0
Data structure describing the elements of a Keyword Oriented Syntax.
Empty strings present in any of these sets will usually refer to the End of Line. This notation is primarily intended for area exclusions for supporting line comments, but literals and enclosures may also use them to represent the beginning or end of a line.
While Profile is a tuple subclass, indexes should not be used to access members.
Profile__slots__0
__slots__ = ()
@classmethod
Profilefrom_keywords_v10
from_keywords_v1(Class, **wordtypes)
Profilewords0typing.Mapping[str, typing.Set[str]]
Dictionary associating sets of identifier strings with a classification identifier.
Profileexclusions0typing.Set[typing.Tuple[str,str]]
Comment start and stop pairs delimiting an excluded area from the source.
Exclusions are given the second highest priority by Parser.
Profileliterals0typing.Set[typing.Tuple[str,str]]
The start and stop pairs delimiting a literal area within the syntax document. Primarily used for string quotations, but supports distinct stops for handling other cases as well.
Literals are given the highest priority by Parser.
Profileenclosures0typing.Set[typing.Tuple[str,str]]
The start and stop pairs delimiting an expression.
Enclosures have the highest precedence during expression processing.
Profilerouters0typing.Set[str]
Operators used to designate a resolution path for selecting an object to be used.
Profileoperations0typing.Set[str]
Set of operators that can perform some manipulation to the objects associated with the adjacent identifiers.
Profileterminators0typing.Set[str]
Operators used to designate the end of a statement, expression, or field.
Profileoperators290%typing.Iterable[str]
Emit all unit operators employed by the language associated with a rank and context effect. Operators may appears multiple times. Empty strings represent end of line.
Operators are emitted by their classification in the following order:
Operations
Routers
Terminators
Enclosures
Literals
Exclusions
Order is deliberate in order to allow mappings to be directly built so later classes will overwrite earlier entries in cases of ambiguity.
Parser0
Keyword Oriented Syntax parser providing tokenization and region delimiting.
Instances do not hold state and methods of the same instance may be used by multiple threads.
ParserEngineering
This is essentially a tightly coupled partial application for tokenize and delimit. from_profile builds necessary parameters using a Profile instance and the internal constructor, __init__, makes them available to the methods.
Applications should create and cache an instance for a given language.
Parserintegrate_switches170%
integrate_switches(tokens, context)
Qualify the syntax fields in tokens by interpreting switches.
Parserintegrate_switchesParameters
tokensIteraable of triples produced by process_document or process_lines.
Parserfrom_profile0
from_profile(Class, profile)
Primary constructor for Parser.
Instances should usually be cached when repeat use is expected as some amount of preparation is performed by from_profile.
Parser__init__0
__init__(self, profile, opset, opmap, delimiter, optable, exits, classify_id, classify_op)
| WARNING | |
The initializer's parameters are subject to change. from_profile should be used to build instances. |
Parserprocess_line0typing.Iterable[Tokens]
process_line(self, line)
Process a single line of syntax into tokens.
Parserprocess_lines0typing.Iterable[typing.Iterable[Tokens]]
process_lines(self, lines)
Process lines using context resets; tokenize and delimit multiple lines resetting the context at the end of each line.
This is the recommended method for extracting tokens from a file for syntax documents that are expected to restate line context, have inaccurate profiles, or are incomplete.
The produced iterators may be ran out of order as no parsing state is shared across lines.
Essentially, map(Parser.process_line, line_iter).
Parserprocess_document0typing.Iterable[typing.Iterable[Tokens]]
process_document(self, lines)
Process lines of a complete source code file using continuous context; tokenize and delimit multiple lines maintaining the context across all lines.
This is the recommended method for extracting tokens from a file for syntax documents that are expected to not restate line context and have accurate profiles.
The produced iterators must be ran in the produced order as the context is shared across instances.
Parserallocstack0
allocstack(self)
Allocate context stack for use with delimit.
Parserdelimit295%Tokens
delimit(self, context, tokens)
Insert switch tokens into an iteration of tokens marking the boundaries of expressions, comments and quotations.
context is manipulated during the iteration and maintains the nested state of comments. allocstack may be used to allocate an initial state.
This is a relatively low-level method; process_lines or process_document should normally be used.
Parsertokenize0Tokens
tokenize(self, line)
Tokenize a string of syntax according to the profile.
Direct use of this is not recommended as boundaries are not signalled. process_line, process_lines, or process_document should be used. The raw tokens, however, are usable in contexts where boundary information is not desired or is not accurate enough for an application's use.