Add python extractor

This commit is contained in:
Nick Hahn 2022-09-14 14:04:35 +02:00
parent 52cd87650d
commit 7a422ef89b
4 changed files with 174 additions and 0 deletions

View file

@ -8,6 +8,37 @@ Follow the guide over [here](https://docs.cypress.io/guides/getting-started/inst
You also require NodeJS. Run `npm ci` to install the required packages.
Create new python venv:
```
python -m venv venv
```
Activate the new environment (linux)
```
source venv/bin/activate
```
Install the requirements
```
pip -r requirements.txt
```
## Execution
To start cypress, simply execute `npx cypress open`. Then click `E2E Testing` and run using electron. This step could be automated using the `cypress` [API](https://docs.cypress.io/guides/guides/module-api).
Start the extractor
```
python hedgedoc-image.py meta_pad new_netloc
```
For example:
```
python hedgedoc-image.py https://md.margau.net/dbk-meta pad.hacknang.de
```
## Produced files
The python scripts produces a `pads.json` which contains the mapping from `old_url` to `new_url`.
All images land in `images/uploads`. Only images hosted on the `old_pads` URL are saved