[pipeline] Add additional documentation to the README (#1328)

author: Alex Browne <stephenalexbrowne@gmail.com> 2018-11-28 08:43:00 +0800
committer: Alex Browne <stephenalexbrowne@gmail.com> 2018-12-05 06:24:48 +0800
commit: 4061731245a8513e8d990f3af87e182fb674838b (patch)
tree: aa7825eb01925eebc5f1e471df4af6a2614ce94c
parent: 96fdb9b76603ac5e4be654e52c52128f0490b3d8 (diff)
download: dexon-sol-tools-4061731245a8513e8d990f3af87e182fb674838b.tar.gz
dexon-sol-tools-4061731245a8513e8d990f3af87e182fb674838b.tar.zst
dexon-sol-tools-4061731245a8513e8d990f3af87e182fb674838b.zip
1 files changed, 113 insertions, 0 deletions
diff --git a/packages/pipeline/README.md b/packages/pipeline/README.md
index df54de21c..c647950a2 100644
--- a/packages/pipeline/README.md
+++ b/packages/pipeline/README.md
@@ -31,3 +31,116 @@ yarn clean
 ```bash
 yarn lint
 ```
+
+### Migrations
+
+Create a new migration: `yarn migrate:create --name MigrationNameInCamelCase`
+Run migrations: `yarn migrate:run`
+Revert the most recent migration (CAUTION: may result in data loss!): `yarn migrate:revert`
+
+## Connecting to PostgreSQL
+
+Across the pipeline package, any code which accesses the database uses the
+environment variable `ZEROEX_DATA_PIPELINE_DB_URL` which should be a properly
+formatted
+[PostgreSQL connection url](https://stackoverflow.com/questions/3582552/postgresql-connection-url).
+
+## Test environment
+
+The easiest way to start Postgres is via Docker. Depending on your
+platform, you may need to prepend `sudo` to the following command:
+
+```
+docker run --rm -d -p 5432:5432 --name pipeline_postgres postgres:11-alpine
+```
+
+This will start a Postgres server with the default username and database name.
+You should set the environment variable as follows:
+
+```
+export ZEROEX_DATA_PIPELINE_DB_URL=postgresql://postgres@localhost/postgres
+```
+
+First thing you will need to do is run the migrations:
+
+```
+yarn migrate:run
+```
+
+Now you can run scripts locally:
+
+```
+node packages/pipeline/lib/src/scripts/pull_radar_relay_orders.js
+```
+
+To stop the Postgres server (you may need to add `sudo`):
+
+```
+docker stop pipeline_postgres
+```
+
+This will remove all data from the database.
+
+If you prefer, you can also install Postgres with e.g.,
+[Homebrew](https://wiki.postgresql.org/wiki/Homebrew) or
+[Postgress.app](https://postgresapp.com/). As long as you set the
+`ZEROEX_DATA_PIPELINE_DB_URL` environment variable appropriately, any Postgres
+server will work.
+
+## Directory structure
+
+```
+.
+├── lib: Code generated by the TypeScript compiler. Don't edit this directly.
+├── migrations: Code for creating and updating database schemas.
+├── node_modules:
+├── src: All TypeScript source code.
+│   ├── data_sources: Code responsible for getting raw data, typically from a third-party source.
+│   ├── entities: TypeORM entities which closely mirror our database schemas. Some other ORMs call these "models".
+│   ├── parsers: Code for converting raw data into entities.
+│   ├── scripts: Executable scripts which put all the pieces together.
+│   └── utils: Various utils used across packages/files.
+├── test: All tests go here and are organized in the same way as the folder/file that they test.
+```
+
+## Adding new data to the pipeline
+
+1.  Create an entity in the _entities_ directory. Entities directly mirror our
+    database schemas. We follow the practice of having "dumb" entities, so
+    entity classes should typically not have any methods.
+2.  Create a migration using the `yarn migrate:create` command. Create/update
+    tables as needed. Remember to fill in both the `up` and `down` methods. Try
+    to avoid data loss as much as possible in your migrations.
+3.  Create a class or function in the _data_sources_ directory for getting raw
+    data. This code should abstract away pagination and rate-limiting as much as
+    possible.
+4.  Create a class or function in the _parsers_ directory for converting the raw
+    data into an entity. Also add tests in the _tests_ directory to test the
+    parser.
+5.  Create an executable script in the _scripts_ directory for putting
+    everything together. Your script can accept environment variables for things
+    like API keys. It should pull the data, parse it, and save it to the
+    database. Scripts should be idempotent and atomic (when possible). What this
+    means is that your script may be responsible for determining **which** data
+    needs to be updated. For example, you may need to query the database to find
+    the most recent block number that we have already pulled, then pull new data
+    starting from that block number.
+6.  Run the migrations and then run your new script locally and verify it works
+    as expected.
+
+#### Additional guidelines and tips:
+
+*   Table names should be plural and separated by underscores (e.g.,
+    `exchange_fill_events`).
+*   Any table which contains data which comes directly from a third-party source
+    should be namespaced in the `raw` PostgreSQL schema.
+*   Column names in the database should be separated by underscores (e.g.,
+    `maker_asset_type`).
+*   Field names in entity classes (like any other fields in TypeScript) should
+    be camel-cased (e.g., `makerAssetType`).
+*   All timestamps should be stored as milliseconds since the Unix Epoch.
+*   Use the `BigNumber` type for TypeScript code which deals with 256-bit
+    numbers from smart contracts or for any case where we are dealing with large
+    floating point numbers.
+*   [TypeORM documentation](http://typeorm.io/#/) is pretty robust and can be a
+    helpful resource.
author	Alex Browne <stephenalexbrowne@gmail.com>	2018-11-28 08:43:00 +0800
committer	Alex Browne <stephenalexbrowne@gmail.com>	2018-12-05 06:24:48 +0800
commit	4061731245a8513e8d990f3af87e182fb674838b (patch)
tree	aa7825eb01925eebc5f1e471df4af6a2614ce94c
parent	96fdb9b76603ac5e4be654e52c52128f0490b3d8 (diff)
download	dexon-sol-tools-4061731245a8513e8d990f3af87e182fb674838b.tar.gz dexon-sol-tools-4061731245a8513e8d990f3af87e182fb674838b.tar.zst dexon-sol-tools-4061731245a8513e8d990f3af87e182fb674838b.zip