3
0
Tomaž Jerman b2abfbf613 Improve node splicing
Reuse existing spliced nodes to reduce the node count.
2020-03-22 12:08:20 +01:00
..
2020-03-22 12:08:20 +01:00
2020-03-22 12:08:20 +01:00
2020-03-22 12:08:20 +01:00
2020-03-22 12:08:20 +01:00

= Data Migration System

This package implements a graph-based data migration system.

== Algorithm

* read & prepare files in a provided directory.
A file's name represents a module handle, so make sure that they match up,
* import system users and take note of there references.
System users will be needed with each migration, so they should be done in step 1,
* initialize the graph based on the provided files and their corresponding modules.
The system can automatically determine dependencies based on their fields,
* remove any cycle from the graph, by splicing one of the cycle nodes.
A spliced node is only able to import it's data, but it can not manage it's dependencies, as they might not yet be known.
Dependencies are updated when it's parent node is being processed,
* determine "leaf" nodes; the nodes with no dependencies.
These can be imported immediately.
* import each leaf node with satisfied dependencies.
Since leaf nodes have no more dependencies, they can be imported in parallel (@todo),
* when the node is finished importing, provide it's mapping information to each one of it's parents, as they will need it.
Also update the list of leaf nodes with parent nodes that have satisfied dependencies.

== Migration Mapping

A big part of this system is the support for migration maps; ie. what field from the original source should map into what module and under what field.

====
Currently only simple conditions, such as `type=specialType` are supported.
====

=== Algorithm
* unmarshal the given `.map.json`
* for each entry of the given source:
** determine the used map based on the provided `where` field & the rows content
** based on the provided `map` entries, update/create buffers
* flush data

=== Example

.source.map.json
[source,json]
----
[
  {
    "where": "type=type1",

    "map": [
      {
        "from": "id",
        "to": "splice_1.original"
      },
      {
        "from": "id",
        "to": "splice_2.original"
      },
      {
        "from": "id",
        "to": "splice.id"
      },

      {
        "from": "field1",
        "to": "splice.customName"
      },
      {
        "from": "field2",
        "to": "splice_1.customName"
      },
      {
        "from": "field3",
        "to": "splice_2.customName"
      }
    ]
  }
]
----