Skip to content
  • Leonardo Nobrega's avatar
    #187 fix parsing for nested topologies, schema validation · 1f4c2358
    Leonardo Nobrega authored
    Incorrect topology reference in validator
    
    A problem happens when we import a GRENML file containing a parent and
    a child topology.
    
    The attachment export_failure_file.xml in ticket 187 has bad
    data. There is a node with a reference to an institution (its
    owner). The id stored in the node is not the institution's id. But
    this file has two topologies.
    
    It is not possible to observe the problem when we load the file
    through the GRENMap server's admin site because the import function,
    which calls the parse method in the library, passes false as the
    raise_error argument.
    
    So the validate method does not raise an exception, it only collects
    issues into a list that it returns, which the import function
    discards, because everything below the parse call runs without errors.
    
    The server then imports the file successfully. However it cannot
    export the file's data, because the write method calls validate
    without the false argument (GRENMLManager.write_to_output_stream).
    
    How it happens
    
    The Validator associated to the GRENMLManager is a singleton; there is
    only one instance shared by all GRENMLManager instances.
    
    The XML parser in the Python library requires a handler object. It
    knows what to do for each element found in the XML.
    
    For every topology in the file, the handler creates a
    GRENMLManager (see TopologyHandler.startElement in
    grenml/parsing/grenml.py).
    
    The first element in the XML stream to be parsed is the root
    topology. On finding it, the parser creates a GRENMLManager. The
    single Validator instance receives a reference for this manager's
    topology.
    
    When the parser descends from the parent topology into the child, it
    creates a new manager through the handler. At this moment, the manager
    constructor modifies the topology reference in the Validator;  its
    previous value was the parent topology, the new value is the child
    topology.
    
    Eventually, the parser reaches the end of the child topology and
    returns to the outer scope, which is the parent topology. The
    Validator then is dirty. Calling it through the manager associated to
    the parent topology, after the parser ends, will fail.
    
    The call through the manager omits the topology parameter (which the
    Validator uses when it recurses into a child topology), because from
    the manager's point of view, its topology is a root topology. The
    Validator's topology is the child, which has a parent. The method
    _validate_topology creates an error item due to this difference.
    
    Schema validation
    
    This also introduces validation that uses the lxml library and the
    schema file grenml.xsd.
    
    XML schema validation will let us verify the names of the elements in
    a file, their attributes and the types of their children.
    
    Lxml will refuse string streams if they contain the XML prologue,
    which declares the encoding used in the string. The existing
    parse_stream method in GRENMLParser became parse_byte_stream.
    
    The directory containing the schema files moved from the project root
    into "grenml". The way the directories are currently arranged makes
    pip create a "schema" directory under site-packages.
    1f4c2358