## Text Page Extractor

This step extracts text from a PDF file held in a field called `bytes`. Each line extracted is annotated with both a 
page number and a line number based on the position on the page. Note that the line number is a "best effort" as 
multiple columns could be printed side by side on the page, but come from completely different parts of the pdf document.
The logic tries to construct the sequence based on the x,y locations of character runs.

### Structure
1 ⇒ 1

### Input
* One field is required `bytes` - the pdf data


### Parameters

There are no parameters

### Output
The input record with the pdf `bytes` replaced with:

* `lines` - An array of objects representing text within the pdf. 
 
Each object contains the following three fields:

* `page` - The page number
 
* `line` - The (logical) line number
 
* `text` - The text extracted

### Notes

* The exact layout of the pdf can be hard to replicate, for example with multiple columns of text. The
   algorithm attempts to sort strings by y-coordinate to match the top/bottom flow of the page.

### See Also

* Text Extractor