191 lines
5.2 KiB
Plaintext
191 lines
5.2 KiB
Plaintext
= Data Migration System
|
|
|
|
This package implements a graph-based data migration system.
|
|
|
|
== Algorithm
|
|
|
|
* read & prepare files in a provided directory.
|
|
A file's name represents a module handle, so make sure that they match up,
|
|
* import system users and take note of there references.
|
|
System users will be needed with each migration, so they should be done in step 1,
|
|
* initialize the graph based on the provided files and their corresponding modules.
|
|
The system can automatically determine dependencies based on their fields,
|
|
* remove any cycle from the graph, by splicing one of the cycle nodes.
|
|
A spliced node is only able to import it's data, but it can not manage it's dependencies, as they might not yet be known.
|
|
Dependencies are updated when it's parent node is being processed,
|
|
* determine "leaf" nodes; the nodes with no dependencies.
|
|
These can be imported immediately.
|
|
* import each leaf node with satisfied dependencies.
|
|
Since leaf nodes have no more dependencies, they can be imported in parallel (@todo),
|
|
* when the node is finished importing, provide it's mapping information to each one of it's parents, as they will need it.
|
|
Also update the list of leaf nodes with parent nodes that have satisfied dependencies.
|
|
|
|
== Migration Mapping
|
|
|
|
A big part of this system is the support for migration maps; ie. what field from the original source should map into what module and under what field.
|
|
|
|
====
|
|
Currently only simple conditions, such as `type=specialType` are supported.
|
|
====
|
|
|
|
=== Algorithm
|
|
* unmarshal the given `.map.json`
|
|
* for each entry of the given source:
|
|
** determine the used map based on the provided `where` field & the rows content
|
|
** based on the provided `map` entries, update/create buffers
|
|
* flush data
|
|
|
|
=== Example
|
|
|
|
.source.map.json
|
|
[source,json]
|
|
----
|
|
[
|
|
{
|
|
"where": "type=type1",
|
|
|
|
"map": [
|
|
{
|
|
"from": "id",
|
|
"to": "splice_1.original"
|
|
},
|
|
{
|
|
"from": "id",
|
|
"to": "splice_2.original"
|
|
},
|
|
{
|
|
"from": "id",
|
|
"to": "splice.id"
|
|
},
|
|
|
|
{
|
|
"from": "field1",
|
|
"to": "splice.customName"
|
|
},
|
|
{
|
|
"from": "field2",
|
|
"to": "splice_1.customName"
|
|
},
|
|
{
|
|
"from": "field3",
|
|
"to": "splice_2.customName"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
----
|
|
|
|
== Joining Migration Sources
|
|
|
|
An important feature is the system's ability to construct a migration map from multiple migration sources.
|
|
For example; we want to populate a `User` module, that includes data from `User.csv` and `SysUser.csv`.
|
|
|
|
=== Algorithrm
|
|
|
|
* unmarshal the given `.join.json`
|
|
* for each migration node that defines a `.join.json`:
|
|
** determine all "joined" migration nodes that will be used in this join operation,
|
|
** create `{ field: { id: [ value, ... ] } }` object for each base migration node, based on joined nodes,
|
|
** when processing the migration node, respect the above mentioned object and include the specified data.
|
|
|
|
|
|
=== Example
|
|
|
|
.source.join.json
|
|
|
|
`.join.json` files define how multiple migration nodes should join into a single module.
|
|
|
|
The below example instructs, that the current module should be constructed from it self and `subMod`; based on the `SubModRef` and `subMod.Id` relation.
|
|
When creating a `.map.json` file, values from the join operation are available under the specified alias (`...->alias`).
|
|
|
|
[source,json]
|
|
----
|
|
{
|
|
"SubModRef->smod": "subMod.Id"
|
|
}
|
|
----
|
|
|
|
.source.map.json
|
|
[source,json]
|
|
----
|
|
[
|
|
{
|
|
"map": [
|
|
{
|
|
"from": "Id",
|
|
"to": "baseMod.Id"
|
|
},
|
|
|
|
{
|
|
"from": "baseField1",
|
|
"to": "baseMod.baseField1"
|
|
},
|
|
|
|
{
|
|
"from": "smod.field1",
|
|
"to": "baseMod.SubModField1"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
----
|
|
|
|
It is also possible to define a join operation on multiple fields at the same time -- useful in cases where a unique PK is not available and must be constructed.
|
|
The following example uses `CreatedDate` and `CreatedById` fields as an index.
|
|
|
|
[source,json]
|
|
----
|
|
{
|
|
"[CreatedDate,CreatedById]->smod": "subMod.[CreatedDate,CreatedById]"
|
|
}
|
|
----
|
|
|
|
== Value Mapping
|
|
|
|
The system allows us to map a specific value from the provided `.csv` file into a value used by the system.
|
|
For example; we can map `In Progress` into `in_progress`.
|
|
The mapping also supports a default value, by using the `*` wildcard.
|
|
|
|
=== Algorithrm
|
|
|
|
* unmarshal the given `.value.json`
|
|
* before applying a value for the given field, attempt to map the value
|
|
** if mapping is successful, use the mapped value,
|
|
** else if default value exists, use the default value,
|
|
** else use the original value.
|
|
|
|
=== Example
|
|
|
|
.source.values.json
|
|
|
|
The following value mapping maps `sys_status` field's values; the left one into the right one, with a default of `"new"` (`"*": "new"`).
|
|
|
|
[source,json]
|
|
----
|
|
{
|
|
"sys_status": {
|
|
"In Progress": "in_progress",
|
|
"Send to QA": "qa_pending",
|
|
"Submit Job": "qa_approved",
|
|
"*": "new"
|
|
}
|
|
}
|
|
----
|
|
|
|
The system also provides support for arbitrary mathematical expressions.
|
|
If you wish to perform an expression, prefix the mapped value with `=EVL=`; for example `=EVL=numFmt(cell, \"%.0f\")`.
|
|
|
|
Variables:
|
|
* current cell -- `cell`.
|
|
|
|
The following example will remove the decimal point from every `sys_rating` in the given source.
|
|
|
|
[source,json]
|
|
----
|
|
{
|
|
"sys_rating": {
|
|
"*": "=EVL=numFmt(cell, \"%.0f\")"
|
|
}
|
|
}
|
|
----
|