5. API reference

These are the modules in maturity order:

xleash A mini-language for “throwing the rope” around rectangular areas of Excel-sheets.
mappings Hierarchical string-like objects useful for indexing, that can be rename/relocated at a later stage.
pandata A pandas-model is a tree of strings, numbers, sequences, dicts, pandas instances and resolvable URI-references, implemented by Pandel.
components Defines the building-blocks of a “model”:

5.1. Module: pandalone.xleash

A mini-language for “throwing the rope” around rectangular areas of Excel-sheets.

5.1.1. About

Any decent dataset is stored in csv. Consequently, many datasets are still trapped in excel-sheets.

XLeash defines a url-fragment notation (xl-ref) that renders the capturing of tables from sheets as practical as reading a csv, even when the exact position of those tables are not known beforehand.

An additional goal is to apply the same lassoing operation recursively, to build data-trees. For that end, the syntax supports filter transformations such as:

  • setting the dimensionality of the result tables,
  • creating higher-level objects from 2D capture-rect (dictionaries, numpy-arrays & dataframes).

It is based on xlrd library but also checked for compatibility with xlwings COM-client library. It requires numpy and (optionally) pandas. It is developed on python-3 but also tested on python-2 for compatibility.

5.1.2. Overview

The xl-ref notation extends ordinary A1 and RC excel coordinates with conditional traversing operations, based on the cell’s empty/full state. For instance, to extract a contigious table near the A1 cell, and make a pandas.DataFrame out of it use this:

from pandalone import xleash, SheetsFactory

shfac = SheetsFactory()
shfac.list_sheetnames(''path/to/workbook.xlsx')
[Sheet1', ...]

## Search and capture the first contiguous table from the 1st sheet
#  as a pandas-DataFrame:
df = xleash.lasso('path/to/workbook.xlsx#0!A1(DR):..(DR):RLDU:["df"]',
                  sheets_factory=shfac)

## Assuming the sheet contain a single table, a lone `:` fetches
#  the same contents.  Additionally, it is possible
#  to skip the sheetname/sheet-index (1st 1st sheet implied).
df = xleash.lasso('#:["df"]',
                  url_file=path/to/workbook.xlsx,
                  sheets_factory=shfac)

5.1.2.1. Xl-ref Syntax

[<url>]#[<sheet>!][<1st-edge>][:[<2nd-edge>][:<expansions>]][:<filters>]

5.1.2.2. Annotated Example

target-moves─────┐
landing-cell──┐  │
             ┌┤ ┌┤
            #C3(UL):..(RD):RULD:["pipe": ["odict", "recursive"]]
             └─┬──┘ └─┬──┘ └┬─┘ └──────────────┬───────────────┘
1st-edge───────┘      │     │                  │
2nd-edge──────────────┘     │                  │
expansions──────────────────┘                  │
filters────────────────────────────────────────┘

Which means:

  1. Target the 1st edge of the capture-rect by starting from C3 landing-cell. If it is a full-cell, stop, otherwise start moving above and to the left of C3 and stop on the first full-cell;
  2. continue from the last target and travel the exterior row and column right and down, stopping on their last full-cell;
  3. capture all the cells between the 2 targets.
  4. try expansions to all directions if any neighbouring full-cell;
  5. finally filter the values of the capture-rect to wrap them up in an ordered- dictionary, and dive into its values searching for xl-ref, and replace them.

5.1.2.3. Basic Usage

The simplest way to lasso a xl-ref is through lasso(). A common task is to capture all non-empty cells of the 1st workbook-sheet but without any bordering nulls:

>>> from pandalone import xleash

>>> values = xleash.lasso('path/to/workbook.xlsx#:')  

Assuming that the full-cell of the 1st sheet of the workbook on disk are those marked with 'X', then the result capture-rect of the above call would be a 2D list-of-lists with the values contained in C2:E4:

  A B C D E
1    ┌─────┐
2    │    X│
3    │X    │
4    │  X  │
5    └─────┘

If another sheet is desired, add its name or 0-based ordinal immediately after # separated by a ! with the rest of the xl-ref - which inthat case might be empty:

>>> lasso = xleash.lasso
>>> lasso('Book.xlsx#Sheet1!') == lasso('Book.xlsx#0!') == lasso('Book.xlsx#:') 
True

If you do not wish to let the library read your workbooks, you can invoke the function with a pre-loaded sheet. Here we will use the utility ArraySheet with a more complicated xl-ref expression:

>>> sheet = xleash.ArraySheet([[None, None,  'A',   None],
...                          [None, 2.2,   'foo', None],
...                          [None, None,   2,    None],
...                          [None, None,   None, 3.14],
... ])
>>> xleash.lasso('#A1(DR):..(DR):RULD', sheet=sheet)
[[None, 'A'],
 [2.2, 'foo'],
 [None, 2]]

This capture-rect in this case was B1 and C3 as can be seen by inspecting the st and nd fields of the full Xlref results returned:

>>> xleash.lasso('#A1(DR):..(DR):RULD', sheet=sheet, return_lasso=True)
Lasso(xl_ref='#A1(DR):..(DR):RULD',
      url_file=None,
      sh_name=None,
      st_edge=Edge(land=Cell(row='1', col='A'), mov='DR', mod=None),
      nd_edge=Edge(land=Cell(row='.', col='.'), mov='DR', mod=None),
      exp_moves='RULD',
      call_spec=None,
      sheet=ArraySheet(SheetId(book='wb', ids=['sh', 0]),
                             [[None None 'A' None]
                              [None 2.2 'foo' None]
                              [None None 2 None]
                              [None None None 3.14]]),
      st=Coords(row=0, col=1),
      nd=Coords(row=2, col=2),
      values=[[None, 'A'],
              [2.2, 'foo'],
              [None, 2]],
      base_coords=None,
      ...

For controlling explicitly the configuration parameters and the opening of workbooks, use separate instances of Ranger and SheetsFactory, that are the workhorses of this library:

>>> with xleash.SheetsFactory() as sf:
...     sf.add_sheet(sheet, wb_ids='foo_wb', sh_ids='Sheet1')
...     ranger = xleash.Ranger(sf, base_opts={'verbose': True})
...     ranger.do_lasso('foo_wb#Sheet1!__').values
3.14

Notice that it returned a scalar value since we specified only the 1st edge as '__', which points to the bottom row and most-left column of the sheet.

Alternatively you can call the make_default_Ranger() for extending library’s defaults.

5.1.2.4. More Syntax Examples

Another typical but more advanced case is when a sheet contains a single table with a “header”-row and a “index”-column. There are (at least) 3 ways to do it, beyond specifying the exact coordinates:

  A B C D E
1  ┌───────┐     Β2:E4          ## Exact referencing.
2  │  X X X│     ^^.__  or :    ## From top-left full-cell to bottom-right.
3  │X X X X│     A1(DR):__:U1   ## Start from A1 and move down and right
3  │X X X X│                    #    until B3; capture till bottom-left;
4  │X X X X│                    #    expand once upwards (to header row).
   └───────┘     A1(RD):__:L1   ## Start from A1 and move down by row
                                #    until C1; capture till bottom-left;
                                #    expand once left (to index column).

Note that if B1 were full, the results would still be the same, because ? expands only if any full-cell found in row/column.

In case where the sheet contains more than one disjoint tables, the bottom-left cell of the sheet would not coincide with table-end, so the handy last two xl-ref above would not work.

For that we may resort to dependent referencing for the 2nd edge, and define its position in relation to the 1st target:

  A B C D E
1  ┌─────┐    _^:..(LD+):L1     ## Start from top-right(E2) and target left
2  │  X X│                      #    left(D2); from there capture left-down
3  │X X X│                      #    till 1st empty-cell(C4, regardless of
4  │X X X│                      #    col/row order); expand left once.
   └─────┘    ^_(U):..(UR):U1   ## Start from B5 and target 1st cell up;
5         Χ                     #    capture from there till D3; expand up.

In the presence of empty-cell breaking the exterior row/column of the 1st landing-cell, the capturing becomes more intricate:

  A B C D E
1  ┌─────┐      Β2:D_
2  │  X X│      A1(RD):..(RD):L1D
3  │X X  │      D_:^^
3  │X    │      A^(DR):D_:U
4  │  X  │X
   └─────┘


  A B C D E
1    ┌───┐      ^^(RD):..(RD)
2    │X X│      _^(R):^.(DR)
3   X│X  │
     └───┘
3   X
4     X   X


  A B C D E
1  ┌───┐        Β2:C4
2  │  X│X       A1(RD):^_
3  │X X│        C_:^^
3  │X  │        A^(DR):C_:U
4  │  X│  X     ^^(RD):..(D):D
   └───┘        D2(L+):^_

See also

Example spreadsheet: xleash.xlsx

5.1.3. Definitions

lasso
lassoing

It may denote 3 things:

  • the whole procedure of parsing the xl-ref syntax, capturing values from spreadsheet rect-regions and sending them through any filters specified in the xl-ref;
  • the lasso() and/or Ranger.do_lasso() functions performing the above job;
  • the Lasso storing intermediate and final results of the above algorithm.
xl-ref

Any url with its fragment abiding to the syntax defined herein.

  • The fragment describes how to capture rects from excel-sheets, and it is composed of 2 edge references followed by expansions and filters.
  • The file-part should resolve to an excel-file.
parse
parsing
The stage where the input string gets splitted and checked for validity against the xl-ref syntax.
edge

An edge might signify:

In all cases above there are 2 instances; the 1st and 2nd.

1st
2nd

It may refer to the 1st/2nd:

The 1st-edge` supports `absolute` `coordinates` only, while the *2nd-edge supports also dependent ones from the 1st target-cell.

landing-cell
The cell identified by the coordinates of the edge alone.
target-cell
target-rect
The bounding cell identified after applying target-moves on the landing-cell.
target
targeting

The process of identifying any target-cell bounding the target-rect.

Note that in the case of a dependent 2nd edge, the target-rect would always be the same, irrespective of whether target-moves denoted a row-by-row or column-by-column traversal.

capture
capturing

It is the overall procedure of:

  1. targeting both edge refs to come up with the target-rect;
  2. performing expansions to identify the capture-rect;
  3. extracting the values and feed them to filters.
capture-rect
capture-cell
The rectangular-area of the sheet denoted by the two capture-cells identified by capturing, that is, after applying expansions on target-rect.
directions
The 4 primitive directions that are denoted with one of the letters LURD. Thee are used to express both target-moves and expansions.
coordinate
coordinates
Any pair of a cell/column coordinates specifying cell positions, (i.e. landing-cell, target-cell, bounds of the capture-rect) written as the first part of the edge syntax, or implicitely resolved. They can be expressed in A1 or RC format or as a zero-based (row, col) tuple (num). Each coordinate might be absolute or dependent, independently.
traversing
traversal-operations
Either the target-moves or the expansion-moves that comprise the capturing.
target-moves

Specify the cell traversing order while targeting using primitive directions pairs. The pairs UD and LR (and their inverse) are invalid. I.e. DR means:

“Start going right, column-by-column, traversing each column from top to bottom.”
move-modifier
One of + and - chars that might trail the target-moves and define which the termination-rule to follow if landing-cell is full-cell, i.e. A1(RD+)
expansions
expansion-moves

Due to state-change on the ‘exterior’ cells the capture-rect might be smaller that a wider contigious but “convex” rectangular area.

The expansions attempt to remedy this by providing for expanding on arbitrary directions accompanied by a multiplicity for each one. If multiplicity is unspecified, infinite assumed, so it expands until an empty/full row/column is met.

absolute

Any cell row/col identified with column-characters, row-numbers, or the following special-characters:

dependent
base-cell

A landing-cell whose any coordinate is identified with a dot(.), which resolves to the base-coordinate depending on which edge it is referring to:

An edge might contain a “mix” of absolute and dependent coordinates.

state
full-cell
empty-cell
A cell is full when it is not empty / blank (in Excel’s parlance).
states-matrix
A boolean matrix denoting the state of the cells, having the same size as a sheet it was derived from.
state-change
Whether we are traversing from an empty-cell to a full-cell, and vice-versa, while targeting.
termination-rule

The condition to stop targeting while traversing from landing-cell. The are 2 rules: search-same and search-opposite.

See also

Check Target-termination enactment for the enactment of the rules.

search-opposite
The target-cell is the FIRST full-cell found while traveling from the landing-cell according to the target-moves.
search-same
The coordinates of the target-cell are given by the LAST full-cell on the exterior column/row according to the target-moves; the order of the moves is insignificant in that case.
exterior
The column and the row of the landing-cell; the search-same termination-rule gets to be triggered by ‘full-cells’ only on them.
filter
filters
The last part of the xl-ref specifying predefined functions to apply for transforming the cell-values of capture-rect, abiding to the json syntax. They may be bulk or element-wise.
bulk
bulk-filter
A filter treating capture-rect values as a whole , i.e. transposing arrays, is_empty
element-wise
element-wise-filter
A filter diving into capture-rect values, i.e. for python-eval.
call-specifier
call-spec

The structure to specify some function call in the filter part; it can either be a json string, list or object like that:

  • string: "func_name"
  • list: ["func_name", ["arg1", "arg2"], {"k1": "v1"}] where the last 2 parts are optional and can be given in any order;
  • object: {"func": "func_name", "args": ["arg1"], "kwds": {"k":"v"}} where the args and kwds are optional.

If the outer-most filter is a dictionary, a 'pop' kwd is popped-out as the opts.

opts
Key-value pairs affecting the lassoing (i.e. opening xlrd-workbooks). Read the code to be sure what are the available choices :-( They are a combination of options specified in code (i.e. in the lasso() and those extracted from filters by the ‘opts’ key, and they are stored in the Lasso.
backend
backends

IO level object providing the actual spreadsheet cells for capturing. Each backend may provide for its workbooks and sheets corresponding to: - different implementations (e.g.``xlrd`` or xlwings library), or - different origins (e.g. file-based, network-based per url ).

The decision which backend to use is taken by the sheet-factory following a bidding process.

sheets-factory
IO level object acting as the caching manager for spreadsheets fetched from different backends. The caching happens per spreadsheet.
bid
backend-bidding
All backends are asked to provide their willingness to handle some xl-ref (see SimpleSheetFactory.decide_backend())). For a sibling sheet, always the parent backend is used.
sheet
spreadsheet
IO level object that acts as the container of cells.

5.1.4. Details

5.1.4.1. Target-moves

There are 12 target-moves named with a single or a pair of letters denoting the 4 primitive directions, LURD:

        U
 UL◄───┐▲┌───►UR
LU     │││     RU
 ▲     │││     ▲
 │     │││     │
 └─────┼│┼─────┘
L◄──────X──────►R
 ┌─────┼│┼─────┐
 │     │││     │
 ▼     │││     ▼
LD     │││     RD
 DL◄───┘▼└───►DR
        D

- The 'X' at the center points the starting cell.

So a RD move means “traverse cells first by rows then by columns”, or more lengthy description would be:

“Start moving *right till 1st state change, and then move down to the next row, and start traversing right again.”*

5.1.4.2. Target-cells

Using these moves we can identify a target-cell in relation to the landing-cell. For instance, given this xl-sheet below, there are multiple ways to identify (or target) the non-empty values X, below:

  A B C D E F
1
2
3     X        ──────► C3    A1(RD)   _^(L)      F3(L)
4         X    ──────► E4    A4(R)    _4(L)      D1(DR)
5   X          ──────► B5    A1(DR)   A_(UR)     _5(L)
6           X  ──────► F6    __       _^(D)      A_(R)

- The 'X' signifies non-empty cells.

So we can target cells with “absolute coordinates”, the usual A1 notation, augmented with the following special characters:

  • undesrcore(_) for bottom/right, and
  • accent(^) for top/left

columns/rows of the sheet with non-empty values.

When no LURD moves are specified, the target-cell coinceds with the starting one.

See also

Target-termination enactment section

5.1.4.3. Capturing

To specify a complete capture-rect we need to identify a 2nd cell. The 2nd target-cell may be specified:

  • either with absolute coordinates, as above, or
  • with dependent coords, using the dot(.) to refer to the 1st cell.

In the above example-sheet, here are some ways to specify refs:

  A  B C D E  F
1

2
      ┌─────┐
   ┌──┼─┐   │
3  │  │X│   │
   │┌─┼─┼───┼┐
4  ││ │ │  X││
   ││ └─┼───┴┼───► C3:E4   A1(RD):..(RD)   _^(L):..(DR)   _4(L):A1(RD)
5  ││X  │    │
   │└───┼────┴───► B4:E5   A_(UR):..(RU)   _5(L):1_(UR)    E1(D):A.(DR)
6  │    │     X
   └────┴────────► Β3:C6   A1(RD):^_       ^^:C_           C_:^^

Warning

Of course, the above rects WILL FAIL since the target-moves will stop immediately due to X values being surrounded by empty-cells.

But the above diagram was to just convey the general idea. To make it work, all the in-between cells of the peripheral row and columns should have been also non-empty.

Note

The capturing moves from 1st target-cell to 2nd target-cell are independent from the implied target-moves in the case of dependent coords.

More specifically, the capturing will always fetch the same values regardless of “row-first” or “column-first” order; this is not the case with targeting (LURD) moves.

For instance, to capture B4:E5 in the above sheet we may use _5(L):E.(U). In that case the target cells are B5 and E4 and the target-moves to reach the 2nd one are UR which are different from the U specified on the 2nd cell.

5.1.4.4. Target-termination enactment

The guiding principle for when to enact each rule is to always capture a matrix of full-cell.

So, both move-modifier apply only when landing-cell is full-cell , and - actually makes sense only when 2nd edge is dependent.

If the termination conditions is not met, an EmptyCaptureException is raised, which is translated as empty capture-rect by Ranger when opts contain {"no_empty": false} (default).

5.1.4.5. Expansions

Captured-rects (“values”) may be limited due to empty-cell in the 1st row/column traversed. To overcome this, the xl-ref may specify expansions directions using a 3rd :-section like that:

_5(L):1_(UR):RDL1U1

This particular case means:

“Try expanding Right and Down repeatedly and then try once Left and Up.”

Expansion happens on a row-by-row or column-by-column basis, and terminates when a full empty(or non-empty) line is met.

Example-refs are given below for capturing the 2 marked tables:

  A  B C D E F  G
1
   ┌───────────┐
   │┌─────────┐│
2  ││  1 X X  ││
   ││         ││
3  ││X X   X X││
   ││         ││
4  ││X X X 2 X││
   ││         ││
5  ││X   X X X││
   └┼─────────┼┴──► A1(RD):..(RD):DRL1
6   │X        │
    └─────────┴───► A1(RD):..(RD):L1DR       A_(UR):^^(RD)
7               X

- The 'X' signify non-empty cells.
- The '1' and '2' signify the identified target-cells.

5.1.5. Plugin Extensions

The xleash library already uses setuptools entry-points to attach backends and pandas filters. Read init_plugins() to learn how to implement other plugins.

5.1.6. API

  • User-facing higher-level functionality:

    Lasso(xl_ref, url_file, sh_name, st_edge, …) All the fields used by the algorithm, populated stage-by-stage by Ranger.
    lasso(xlref[, sheets_factory, base_opts, …]) High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a Ranger .
    Ranger(sheets_factory[, base_opts, …]) The director-class that performs all stages required for “throwing the lasso” around rect-values.
    Ranger.do_lasso(xlref, **context_kwds) The director-method that does all the job of hrowing a lasso around spreadsheet’s rect-regions according to xl-ref.
    make_default_Ranger([sheets_factory, …]) Makes a defaulted Ranger.
    get_default_opts([overrides]) Default opts used by lasso() when constructing its internal Ranger.
  • Related to capturing algorithm:

    resolve_capture_rect(states_matrix, …[, …]) Performs targeting, capturing and expansions based on the states-matrix.
    coords2Cell(row, col) Make A1 Cell from resolved coords, with rudimentary error-checking.
    EmptyCaptureException Thrown when targeting fails.
    installed_filters
    xlwings_dims_call_spec() A list call-spec for _redim_filter() filter that imitates results of xlwings library.
  • Related to parsing and basic structure used throughout: .. currentmodule:: pandalone.xleash._parse .. autosummary:

    parse_xlref
    parse_expansion_moves
    parse_call_spec
    Cell
    Coords
    Edge
    
  • IO back-end functionality:

    io_backends
    _sheets.SheetsFactory
    _sheets.ABCSheet.read_rect
    _sheets.ArraySheet
    _sheets.ABCSheet
    _xlrd.XlrdSheet
    _xlrd.open_sheet
  • Plugin related .. autosummary:

     _init_plugins
    _plugins_installed
    _PLUGIN_GROUP_NAME
    
pandalone.xleash._init_plugins(plugin_group_name='pandalone.xleash.plugins')[source]

Discover and load plugins.

The xleash library already uses setuptools entry-points to attach backend Sheet and pandas filters.

You may re-invoke after some pip install <some-xleash-plugin>.

To implement a new plugin, you have to package your code as a regular python distribution and add the following declaration inside its setup.py:

setup(
    # ...
    entry_points = {
        'pandalone.xleash.plugins': [
            'plugin_1 = <foo.plugin.module>:<plugin-install-func> ## Load & install.
            'plugin_2 = <bar.plugin.module>                       ## Load only.
        ]
    }
)

The plugins are initialized during import time in a 2-stage procedure by init_plugins(). A plugin is loaded and optionally installed if the setup-configuration above specifies a no-args <plugin-install-func> callable. Any collected <plugin-install-func> callables are invoked AFTER all plugin-modules have finished loading.

Tip

For example, study this project how it sets backend and filters.

Warning

When appending into “hook” lists during installation, remember to avoid re-inserting duplicate items. In general try to well-behave even when plugins are initialized multiple times!

pandalone.xleash._PLUGIN_GROUP_NAME = 'pandalone.xleash.plugins'

Used to discover setuptools extension-points.

pandalone.xleash.resolve_capture_rect(states_matrix, up_dn_margins, st_edge, nd_edge=None, exp_moves=None, base_coords=None)[source]

Performs targeting, capturing and expansions based on the states-matrix.

To get the margin_coords, use one of:

Its results can be fed into read_capture_values().

Parameters:
  • states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
  • Coords) up_dn_margins ((Coords,) – the top-left/bottom-right coords with full-cells
  • st_edge (Edge) – “uncooked” as matched by regex
  • nd_edge (Edge) – “uncooked” as matched by regex
  • or none exp_moves (list) – Just the parsed string, and not None.
  • base_coords (Coords) – The base for a dependent 1st edge.
Returns:

a (Coords, Coords) with the 1st and 2nd capture-cell ordered from top-left –> bottom-right.

Return type:

tuple

Raises:

EmptyCaptureException – When targeting failed, and no target cell identified.

Examples::
>>> from pandalone.xleash import Edge, margin_coords_from_states_matrix
>>> states_matrix = np.array([
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 1, 1, 1],
...     [0, 0, 1, 0, 0, 1],
...     [0, 0, 1, 1, 1, 1]
... ], dtype=bool)
>>> up, dn = margin_coords_from_states_matrix(states_matrix)
>>> st_edge = Edge(Cell('1', 'A'), 'DR')
>>> nd_edge = Edge(Cell('.', '.'), 'DR')
>>> resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
(Coords(row=3, col=2), Coords(row=4, col=2))

Using dependenent coordinates for the 2nd edge:

>>> st_edge = Edge(Cell('_', '_'), None)
>>> nd_edge = Edge(Cell('.', '.'), 'UL')
>>> rect = resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
>>> rect
(Coords(row=2, col=2), Coords(row=4, col=5))

Using sheet’s margins:

>>> st_edge = Edge(Cell('^', '_'), None)
>>> nd_edge = Edge(Cell('_', '^'), None)
>>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
True

Walking backwards:

>>> st_edge = Edge(Cell('^', '_'), 'L')          # Landing is full, so 'L' ignored.
>>> nd_edge = Edge(Cell('_', '_'), 'L', '+')    # '+' or would also stop.
>>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
True
class pandalone.xleash.ABCSheet[source]

Bases: object

A delegating to backend factory and sheet-wrapper with utility methods.

Parameters:
  • _states_matrix (np.ndarray) – The states-matrix cached, so recreate object to refresh it.
  • _margin_coords (dict) – limits used by _resolve_cell(), cached, so recreate object to refresh it.

Resource management is outside of the scope of this class, and must happen in the backend workbook/sheet instance.

xlrd examples:

>>> import xlrd                                       
>>> with xlrd.open_workbook(self.tmp) as wb:          
...     sheet = xleash.xlrdSheet(wb.sheet_by_name('Sheet1'))
...     ## Do whatever

win32 examples:

>>> with dsgdsdsfsd as wb:          
...     sheet = xleash.win32Sheet(wb.sheet['Sheet1'])
TODO: Win32 Sheet example
__repr__()[source]

Return repr(self).

_close()[source]

Override it to release resources for this sheet.

_close_all()[source]

Override it to release resources this and all sibling sheets.

_read_margin_coords()[source]

Override if possible to read (any of the) limits directly from the sheet.

Returns:the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None).
Return type:(Coords, Coords)
Raise:EmptyCaptureException if sheet empty
_read_states_matrix()[source]

Read the states-matrix of the wrapped sheet.

Returns:A 2D-array with False wherever cell are blank or empty.
Return type:ndarray
get_margin_coords()[source]

Extract (and cache) margins either internally or from margin_coords_from_states_matrix().

Returns:the resolved top-left and bottom-right xleash.Coords
Return type:tuple
Raise:EmptyCaptureException if sheet empty
get_sheet_ids()[source]
Returns:a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx
Return type:SheetId or None
get_states_matrix()[source]

Read and cache the states-matrix of the wrapped sheet.

Returns:A 2D-array with False wherever cell are blank or empty.
Return type:ndarray
Raise:EmptyCaptureException if sheet empty
list_sheetnames()[source]

Return a list of names

open_sibling_sheet(sheet_id)[source]

Return a sibling sheet by the given index or name

read_rect(st, nd)[source]

Fecth the actual values from the backend Excel-sheet.

Parameters:
  • st (Coords) – the top-left edge, inclusive
  • None nd (Coords,) – the bottom-right edge, inclusive(!); when None, must return a scalar value.
Returns:

Depends on whether both coords are given:
  • If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
  • If only 1st given, the scalar value, and if beyond margins, raise error!

Return type:

list

Raise:

EmptyCaptureException (optionally) if sheet empty

class pandalone.xleash.ArraySheet(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]

Bases: pandalone.xleash.io.backend.ABCSheet

A sample ABCSheet made out of 2D-list or numpy-arrays, for facilitating tests.

__init__(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]

Initialize self. See help(type(self)) for accurate signature.

__repr__()[source]

Return repr(self).

_read_states_matrix()[source]

Read the states-matrix of the wrapped sheet.

Returns:A 2D-array with False wherever cell are blank or empty.
Return type:ndarray
get_sheet_ids()[source]
Returns:a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx
Return type:SheetId or None
list_sheetnames()[source]

Return a list of names

open_sibling_sheet(sheet_id)[source]

Return a sibling sheet by the given index or name

read_rect(st, nd)[source]

Fecth the actual values from the backend Excel-sheet.

Parameters:
  • st (Coords) – the top-left edge, inclusive
  • None nd (Coords,) – the bottom-right edge, inclusive(!); when None, must return a scalar value.
Returns:

Depends on whether both coords are given:
  • If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
  • If only 1st given, the scalar value, and if beyond margins, raise error!

Return type:

list

Raise:

EmptyCaptureException (optionally) if sheet empty

pandalone.xleash.coords2Cell(row, col)[source]

Make A1 Cell from resolved coords, with rudimentary error-checking.

Examples:

>>> coords2Cell(row=0, col=0)
Cell(row='1', col='A')
>>> coords2Cell(row=0, col=26)
Cell(row='1', col='AA')

>>> coords2Cell(row=10, col='.')
Cell(row='11', col='.')

>>> coords2Cell(row=-3, col=-2)
Traceback (most recent call last):
AssertionError: negative row!
exception pandalone.xleash.EmptyCaptureException[source]

Bases: Exception

Thrown when targeting fails.

pandalone.xleash.margin_coords_from_states_matrix(states_matrix)[source]

Returns top-left/bottom-down margins of full cells from a state matrix.

May be used by ABCSheet.get_margin_coords() if a backend does not report the sheet-margins internally.

Parameters:states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
Returns:the 2 coords of the top-left & bottom-right full cells
Return type:(Coords, Coords)
Examples::
>>> states_matrix = np.asarray([
...    [0, 0, 0],
...    [0, 1, 0],
...    [0, 1, 1],
...    [0, 0, 1],
... ])
>>> margins = margin_coords_from_states_matrix(states_matrix)
>>> margins
(Coords(row=1, col=1), Coords(row=3, col=2))

Note that the botom-left cell is not the same as states_matrix matrix size:

>>> states_matrix = np.asarray([
...    [0, 0, 0, 0],
...    [0, 1, 0, 0],
...    [0, 1, 1, 0],
...    [0, 0, 1, 0],
...    [0, 0, 0, 0],
... ])
>>> margin_coords_from_states_matrix(states_matrix) == margins
True
pandalone.xleash.lasso(xlref, sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds)[source]

High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a Ranger .

Parameters:
  • xlref (str) –

    a string with the xl-ref format:

    <url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
    

    i.e.:

    file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
    
  • sheets_factory – Factory of sheets from where to parse rect-values; if unspecified, the new SheetsFactory created is closed afterwards. Delegated to make_default_Ranger(), so items override default ones; use a new Ranger if that is not desired.
  • available_filters (dict or None) – Delegated to make_default_Ranger(), so items override default ones; use a new Ranger if that is not desired.
  • return_lasso (bool) –

    If True, values are contained in the returned Lasso instance, along with all other artifacts of the lassoing procedure.

    For more debugging help, create a Range yourself and inspect the Ranger.intermediate_lasso.

  • context_kwds (Lasso) – Default Lasso fields in case parsed ones are None (i.e. you can specify the sheet like that).
Variables:

base_opts – Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every Ranger.do_lasso(), whether invoked directly or recursively by recursive_filter(). Read the code to be sure what are the available choices. Delegated to make_default_Ranger(), so items override default ones; use a new Ranger if that is not desired.

Returns:

Either the captured & filtered values or the final Lasso, depending on the return_lassos arg.

Example:

sheet = _
class pandalone.xleash.Ranger(sheets_factory, base_opts=None, available_filters=None)[source]

Bases: object

The director-class that performs all stages required for “throwing the lasso” around rect-values.

Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.

The do_lasso() does the job.

Variables:
  • sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not close it in the end. Maybe None, but do_lasso() will scream unless invoked with a context_lasso arg containing a concrete ABCSheet.
  • base_opts (dict) – The opts that are deep-copied and used as the defaults for every do_lasso(), whether invoked directly or recursively by recursive_filter(). If unspecified, no opts are used, but this attr is set to an empty dict. See get_default_opts().
  • or None available_filters (dict) – The filters available for a xl-ref to use. If None, then uses xleash.installed_filters. Use an empty dict not to use any filters.
  • intermediate_lasso (Lasso) – A ('stage', Lasso) pair with the last Lasso instance produced during the last execution of the do_lasso(). Used for inspecting/debuging.
__init__(sheets_factory, base_opts=None, available_filters=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_make_init_Lasso(**context_kwds)[source]

Creates the lasso to be used for each new do_lasso() invocation.

_parse_and_merge_with_context(xlref, init_lasso)[source]

Merges xl-ref parsed-parsed_fields with init_lasso, reporting any errors.

Parameters:init_lasso (Lasso) – Default values to be overridden by non-nulls.
Returns:a Lasso with any non None parsed-fields updated
_relasso(lasso, stage, **kwds)[source]

Replace lasso-values and updated intermediate_lasso.

_resolve_capture_rect(lasso, sheet)[source]

Also handles EmptyCaptureException in case opts['no_empty'] != False.

do_lasso(xlref, **context_kwds)[source]

The director-method that does all the job of hrowing a lasso around spreadsheet’s rect-regions according to xl-ref.

Parameters:
  • xlref (str) –

    a string with the xl-ref format:

    <url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
    

    i.e.:

    file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
    
  • context_kwds (Lasso) – Default Lasso fields in case parsed ones are None
Returns:

The final Lasso with captured & filtered values.

Return type:

Lasso

make_call(lasso, func_name, args, kwds)[source]

Executes a call-spec respecting any lax argument popped from kwds.

Parameters:lax (bool) – After overlaying it on opts, it governs whether to raise on errors. Defaults to False (scream!).
class pandalone.xleash.SheetsFactory(backends=None)[source]

Bases: pandalone.xleash.io.backend.SimpleSheetsFactory

A caching-store of ABCSheet instances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.

Variables:_cached_sheets (dict) – A cache of all _Spreadsheets accessed so far, keyed by multiple keys generated by _derive_sheet_keys().
  • To avoid opening non-trivial workbooks, use the add_sheet() to pre-populate this cache with them.
  • It is a resource-manager for contained sheets, so it can be used wth a with statement.
__init__(backends=None)[source]
Parameters:backends – The list of backends to consider when opening sheets. If it evaluates to false, io_backends assumed.
Typ backends:list or None
_derive_sheet_keys(sheet, wb_ids=None, sh_ids=None)[source]

Retuns the product of user-specified and sheet-internal keys.

Parameters:
  • wb_ids – a single or a sequence of extra workbook-ids (ie: file, url)
  • sh_ids – a single or sequence of extra sheet-ids (ie: name, index, None)
add_sheet(sheet, wb_ids=None, sh_ids=None)[source]

Updates cache.

Parameters:
  • wb_ids – a single or sequence of extra workbook-ids (ie: file, url)
  • sh_ids – a single or sequence of extra sheet-ids (ie: name, index, None)
close()[source]

Closes all contained sheets and empties cache.

fetch_sheet(wb_id, sheet_id, base_sheet=None)[source]
Parameters:base_sheet (ABCSheet) – The sheet used when unspecified wb_id.
pandalone.xleash.io_backends = []

Hook for plugins to append ABCBackend instances.

pandalone.xleash.make_default_Ranger(sheets_factory=None, base_opts=None, available_filters=None)[source]

Makes a defaulted Ranger.

Parameters:
  • sheets_factory – Factory of sheets from where to parse rect-values; if unspecified, a new SheetsFactory is created. Remember to invoke its SheetsFactory.close() to clear resources from any opened sheets.
  • base_opts (dict or None) –

    Default opts to affect the lassoing, to be merged with defaults; uses get_default_opts().

    Read the code to be sure what are the available choices :-(.

  • available_filters (dict or None) – The filters available for a xl-ref to use. (xleash.installed_filters used if unspecified).

For instance, to make you own sheets-factory and override options, yoummay do this:

>>> from pandalone import xleash

>>> with xleash.SheetsFactory() as sf:
...     xleash.make_default_Ranger(sf, base_opts={'lax': True})
<pandalone.xleash._lasso.Ranger object at
...
class pandalone.xleash.XLocation(sheet, st, nd, base_coords)

Bases: tuple

Fields denoting the position of a sheet/cell while running a element-wise-filter.

Practically func:run_filter_elementwise() preserves these fields if the processed ones were `None.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, sheet, st, nd, base_coords)

Create new instance of XLocation(sheet, st, nd, base_coords)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new XLocation object from a sequence or iterable

_replace(**kwds)

Return a new XLocation object replacing specified fields with new values

base_coords

Alias for field number 3

nd

Alias for field number 2

sheet

Alias for field number 0

st

Alias for field number 1

pandalone.xleash.get_default_opts(overrides=None)[source]

Default opts used by lasso() when constructing its internal Ranger.

Parameters:or None overrides (dict) – Any items to update the default ones.
pandalone.xleash.installed_filters = {'dict': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0314b70>, 'desc': "dict() -> new empty dictionary\ndict(mapping) -> new dictionary initialized from a mapping object's\n (key, value) pairs\ndict(iterable) -> new dictionary initialized as if via:\n d = {}\n for k, v in iterable:\n d[k] = v\ndict(**kwargs) -> new dictionary initialized with the name=value pairs\n in the keyword argument list. For example: dict(one=1, two=2)"}, 'numpy': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0cf5e18>, 'desc': '\n MagicMock is a subclass of Mock with default implementations\n of most of the magic methods. You can use MagicMock without having to\n configure the magic methods yourself.\n\n If you use the `spec` or `spec_set` arguments then *only* magic\n methods that exist in the spec will be created.\n\n Attributes and the return value of a `MagicMock` will also be `MagicMocks`.\n '}, 'odict': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0314bf8>, 'desc': 'Dictionary that remembers insertion order'}, 'pipe': {'func': <function pipe_filter at 0x7f81f0343a60>}, 'py': {'func': <function py_filter at 0x7f81f0314a60>}, 'pyeval': {'func': <function pyeval_filter at 0x7f81f03149d8>}, 'recurse': {'func': <function recursive_filter at 0x7f81f0314840>}, 'redim': {'func': <function redim_filter at 0x7f81f0314378>}, 'sorted': {'func': <function install_default_filters.<locals>.<lambda> at 0x7f81f0314c80>, 'desc': 'Return a new list containing all items from the iterable in ascending order.\n\nA custom key function can be supplied to customise the sort order, and the\nreverse flag can be set to request the result in descending order.'}}

Hook for plugins to append filters.

class pandalone.xleash.Lasso(xl_ref, url_file, sh_name, st_edge, nd_edge, exp_moves, call_spec, sheet, st, nd, values, base_coords, opts)

Bases: tuple

All the fields used by the algorithm, populated stage-by-stage by Ranger.

Parameters:
  • xl_ref (str) – The full url, populated on parsing.
  • sh_name (str) –

    Parsed sheet name (or index, but still as string), populated on parsing.

    Note

    If you need the name of the captured sheet, use:

    lasso.sheet.get_sheet_ids().ids[0]
    
  • st_edge (Edge) – The 1st edge, populated on parsing.
  • nd_edge (Edge) – The 2nd edge, populated on parsing.
  • st (Coords) – The top-left targeted coords of the capture-rect, populated on capturing.`
  • nd (Coords) – The bottom-right targeted coords of the capture-rect, populated on capturing
  • sheet (ABCSheet) – The fetched from factory or ranger’s current sheet, populated after capturing before reading.
  • values – The excel’s table-values captured by the lasso, populated after reading updated while applying filters.
  • call_spec – The call-spec derrived from the parsed filters, to be fed into Ranger.make_call().
  • base_coords (Coords) – On recursive calls it becomes the base-cell for the 1st edge.
  • or ChainMap opts (dict) –
    • Before parsing, they are just any ‘opts’ dict found in the filters.
    • After parsing, a 2-map ChainMap with :attr:`Ranger.base_opts` and options extracted from *filters on top.
__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, xl_ref=None, url_file=None, sh_name=None, st_edge=None, nd_edge=None, exp_moves=None, call_spec=None, sheet=None, st=None, nd=None, values=None, base_coords=None, opts=None)

Create new instance of Lasso(xl_ref, url_file, sh_name, st_edge, nd_edge, exp_moves, call_spec, sheet, st, nd, values, base_coords, opts)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new Lasso object from a sequence or iterable

_replace(**kwds)

Return a new Lasso object replacing specified fields with new values

base_coords

Alias for field number 11

call_spec

Alias for field number 6

exp_moves

Alias for field number 5

nd

Alias for field number 9

nd_edge

Alias for field number 4

opts

Alias for field number 12

sh_name

Alias for field number 2

sheet

Alias for field number 7

st

Alias for field number 8

st_edge

Alias for field number 3

url_file

Alias for field number 1

values

Alias for field number 10

xl_ref

Alias for field number 0

pandalone.xleash.xlwings_dims_call_spec()[source]

A list call-spec for _redim_filter() filter that imitates results of xlwings library.

class pandalone.xleash.Cell[source]

Bases: pandalone.xleash._parse.Cell

A pair of 1-based strings, denoting the “A1” coordinates of a cell.

The “num” coords (numeric, 0-based) are specified using numpy-arrays (Coords).

static __new__(cls, row, col, brow=None, bcol=None)[source]

Create new instance of Cell(row, col, brow, bcol)

__repr__()[source]

Return a nicely formatted representation string

__str__()[source]

Return str(self).

class pandalone.xleash.Coords(row, col)

Bases: tuple

A pair of 0-based integers denoting the “num” coordinates of a cell.

The “A1” coords (1-based coordinates) are specified using Cell.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, row, col)

Create new instance of Coords(row, col)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new Coords object from a sequence or iterable

_replace(**kwds)

Return a new Coords object replacing specified fields with new values

col

Alias for field number 1

row

Alias for field number 0

class pandalone.xleash.Edge[source]

Bases: pandalone.xleash._parse.Edge

All the infos required to target a cell.

An Edge contains A1 Cell as land.

Parameters:
static __new__(cls, land, mov=None, mod=None)[source]

Create new instance of Edge(land, mov, mod)

__str__()[source]

Return str(self).

class pandalone.xleash.CallSpec(func, args, kwds)

Bases: tuple

The call-specifier for holding the parsed json-filters.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, func, args=[], kwds={})

Create new instance of CallSpec(func, args, kwds)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new CallSpec object from a sequence or iterable

_replace(**kwds)

Return a new CallSpec object replacing specified fields with new values

args

Alias for field number 1

func

Alias for field number 0

kwds

Alias for field number 2

pandalone.xleash.parse_xlref(xlref)[source]

Like _parse_xlref() but tries also if xlreaf is encased by delimiter chars /\"$%&.

See also

_encase_regex

5.1.7. Submodule: pandalone.xleash._parse

The syntax-parsing part xleash.

Prefer accessing the public members from the parent module.

class pandalone.xleash._parse.CallSpec(func, args, kwds)

Bases: tuple

The call-specifier for holding the parsed json-filters.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, func, args=[], kwds={})

Create new instance of CallSpec(func, args, kwds)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new CallSpec object from a sequence or iterable

_replace(**kwds)

Return a new CallSpec object replacing specified fields with new values

args

Alias for field number 1

func

Alias for field number 0

kwds

Alias for field number 2

class pandalone.xleash._parse.Cell[source]

Bases: pandalone.xleash._parse.Cell

A pair of 1-based strings, denoting the “A1” coordinates of a cell.

The “num” coords (numeric, 0-based) are specified using numpy-arrays (Coords).

static __new__(cls, row, col, brow=None, bcol=None)[source]

Create new instance of Cell(row, col, brow, bcol)

__repr__()[source]

Return a nicely formatted representation string

__str__()[source]

Return str(self).

class pandalone.xleash._parse.Edge[source]

Bases: pandalone.xleash._parse.Edge

All the infos required to target a cell.

An Edge contains A1 Cell as land.

Parameters:
static __new__(cls, land, mov=None, mod=None)[source]

Create new instance of Edge(land, mov, mod)

__str__()[source]

Return str(self).

pandalone.xleash._parse.Edge_new(row, col, mov=None, mod=None, default=None)[source]

Make a new Edge from any non-values supplied, as is capitalized, or nothing.

Parameters:
  • None col (str,) – ie A
  • None row (str,) – ie 1
  • None mov (str,) – ie RU
  • None mod (str,) – ie +
Returns:

a Edge if any non-None

Return type:

Edge, None

Examples:

>>> Edge_new('1', 'a', 'Rul', '-')
Edge(land=Cell(row='1', col='A'), mov='RUL', mod='-')
>>> print(Edge_new('5', '5'))
R5C5

No error checking performed:

>>> Edge_new('Any', 'foo', 'BaR', '+_&%')
Edge(land=Cell(row='ANY', col='FOO'), mov='BAR', mod='+_&%')

>>> print(Edge_new(None, None, None, None))
None

except were coincidental:

>>> Edge_new(row=0, col=123, mov='BAR', mod=None)
Traceback (most recent call last):
AttributeError: 'int' object has no attribute 'upper'

>>> Edge_new(row=0, col='A', mov=123, mod=None)
Traceback (most recent call last):
AttributeError: 'int' object has no attribute 'upper'
pandalone.xleash._parse._excel_str_translator = {8220: 34, 8221: 34}

Excel use these !@#% chars for double-quotes, which are not valid JSON-strings!!

pandalone.xleash._parse._parse_xlref(xlref)[source]

Parse a xl-ref into a dict.

Parameters:xlref (str) – A url-string abiding to the xl-ref syntax.
Returns:A dict with all fields, with None with those missing.
Return type:dict

Examples:

>>> res = parse_xlref('workbook.xlsx#Sheet1!A1(DR+):Z20(UL):L1U2R1D1:'
...                             '{"opts":{}, "func": "foo"}')
>>> sorted(res.items())
 [('call_spec', CallSpec(func='foo', args=[], kwds={})),
 ('exp_moves', 'L1U2R1D1'),
 ('nd_edge', Edge(land=Cell(row='20', col='Z'), mov='UL', mod=None)),
 ('opts', {}),
 ('sh_name', 'Sheet1'),
 ('st_edge', Edge(land=Cell(row='1', col='A'), mov='DR', mod='+')), ('url_file', 'workbook.xlsx'), ('xl_ref', 'workbook.xlsx#Sheet1!A1(DR+):Z20(UL):L1U2R1D1:{"opts":{}, "func": "foo"}')]

Shortcut for all sheet from top-left to bottom-right full-cells:

>>> res=parse_xlref('#:')
>>> sorted(res.items())
[('call_spec', None),
 ('exp_moves', None),
 ('nd_edge', Edge(land=Cell(row='_', col='_'), mov=None, mod=None)),
 ('opts', None),
 ('sh_name', None),
 ('st_edge', Edge(land=Cell(row='^', col='^'), mov=None, mod=None)),
 ('url_file', None),
 ('xl_ref', '#:')]

Errors:

>>> parse_xlref('A1(DR)Z20(UL)')
Traceback (most recent call last):
SyntaxError: No fragment-part (starting with '#'): A1(DR)Z20(UL)

>>> parse_xlref('#A1(DR)Z20(UL)')          ## Missing ':'.
Traceback (most recent call last):
SyntaxError: Not an `xl-ref` syntax: A1(DR)Z20(UL)

But as soon as syntax is matched, subsequent errors raised are ValueErrors:

>>> parse_xlref("#A1:B1:{'Bad_JSON_str'}")
Traceback (most recent call last):
ValueError: Filters are not valid JSON:
Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
  JSON:
{'Bad_JSON_str'}
pandalone.xleash._parse._regular_xlref_regex = re.compile('\n ^\\s*(?:(?P<sh_name>[^!]+)?!)? # xl sheet name\n (?: # 1st-edge\n (?:\n (?:\n , re.IGNORECASE|re.DOTALL|re.VERBOSE)

The regex for parsing regular xl-ref.

pandalone.xleash._parse._repeat_moves(moves, times=None)[source]

Returns an iterator that repeats moves x times, or infinite if unspecified.

Used when parsing primitive directions.

Parameters:
  • moves (str) – the moves to repeat ie RU1D?
  • times (str) – N of repetitions. If None it means infinite repetitions.
Returns:

An iterator of the moves

Return type:

iterator

Examples:

>>> list(_repeat_moves('LUR', '3'))
['LUR', 'LUR', 'LUR']
>>> list(_repeat_moves('ABC', '0'))
[]
>>> _repeat_moves('ABC')  ## infinite repetitions
repeat('ABC')

pandalone.xleash._parse.parse_call_spec(call_spec_values)[source]

Parse call-specifier from json-filters.

Parameters:call_spec_values

This is a non-null structure specifying some function call in the filter part, which it can be either:

  • string: "func_name"
  • list: ["func_name", ["arg1", "arg2"], {"k1": "v1"}] where the last 2 parts are optional and can be given in any order;
  • object: {"func": "func_name", "args": ["arg1"], "kwds": {"k":"v"}} where the args and kwds are optional.
Returns:the 3-tuple func, args=(), kwds={} with the defaults as shown when missing.
pandalone.xleash._parse.parse_expansion_moves(exp_moves)[source]

Parse rect-expansion into a list of dir-letters iterables.

Parameters:exp_moves – A string with a sequence of primitive moves: es. L1U1R1D1
Returns:A list of primitive-dir chains.
Return type:list

Examples:

>>> res = parse_expansion_moves('lu1urd?')
>>> res
[repeat('L'), repeat('U', 1), repeat('UR'), repeat('D', 1)]

# infinite generator
>>> [next(res[0]) for i in range(10)]
['L', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'L']

>>> list(res[1])
['U']

>>> parse_expansion_moves('1LURD')
Traceback (most recent call last):
ValueError: Invalid rect-expansion(1LURD) due to:
        'NoneType' object has no attribute 'groupdict'
pandalone.xleash._parse.parse_xlref(xlref)[source]

Like _parse_xlref() but tries also if xlreaf is encased by delimiter chars /\"$%&.

See also

_encase_regex

pandalone.xleash._parse.parse_xlref_fragment(xlref_fragment)[source]

Parses a xl-ref fragment, anything to the left of the hash(#).

Parameters:xlref_fragment (str) – the url-fragment part of the xl-ref string, including the '#' char.
Returns:dictionary containing the following parameters:
  • sheet: (str, int, None) i.e. sheet_name
  • st_edge: (Edge, None) the 1st-ref, with raw cell i.e. Edge(land=Cell(row='8', col='UPT'), mov='LU', mod='-')
  • nd_edge: (Edge, None) the 2nd-ref, with raw cell i.e. Edge(land=Cell(row='_', col='.'), mov='D', mod='+')
  • exp_moves: (sequence, None), as i.e. LDL1 parsed by parse_expansion_moves()
  • js_filt: dict i.e. {"dims: 1}
Return type:dict

Examples:

>>> res = parse_xlref_fragment('Sheet1!A1(DR+):Z20(UL):L1U2R1D1:'
...                             '{"opts":{}, "func": "foo"}')
>>> sorted(res.items())
[('call_spec', CallSpec(func='foo', args=[], kwds={})),
 ('exp_moves', 'L1U2R1D1'),
 ('nd_edge', Edge(land=Cell(row='20', col='Z'), mov='UL', mod=None)),
 ('opts', {}),
 ('sh_name', 'Sheet1'),
 ('st_edge', Edge(land=Cell(row='1', col='A'), mov='DR', mod='+'))]

Shortcut for all sheet from top-left to bottom-right full-cells:

>>> res = parse_xlref_fragment(':')
>>> sorted(res.items())
[('call_spec', None),
 ('exp_moves', None),
 ('nd_edge', Edge(land=Cell(row='_', col='_'), mov=None, mod=None)),
 ('opts', None),
 ('sh_name', None),
 ('st_edge', Edge(land=Cell(row='^', col='^'), mov=None, mod=None))]

Errors:

>>> parse_xlref_fragment('A1(DR)Z20(UL)')
Traceback (most recent call last):
SyntaxError: Not an `xl-ref` syntax: A1(DR)Z20(UL)

5.1.8. Submodule: pandalone.xleash.io

Backends for opening sheets from various sources.

5.1.9. Submodule: pandalone.xleash.io._sheets

5.1.10. Submodule: pandalone.xleash.io._xlrd

5.1.11. Submodule: pandalone.xleash._capture

The algorithmic part of capturing.

Prefer accessing the public members from the parent module.

pandalone.xleash._capture.CHECK_CELLTYPE = False

When True, most coord-functions accept any 2-tuples.

exception pandalone.xleash._capture.EmptyCaptureException[source]

Bases: Exception

Thrown when targeting fails.

pandalone.xleash._capture._col2num(coord)[source]

Resolves special coords or converts Excel A1 columns to a zero-based, reporting invalids.

Parameters:coord (str) – excel-column coordinate or one of ^_.
Returns:excel column number, >= 0
Return type:int

Examples:

>>> col = _col2num('D')
>>> col
3
>>> _col2num('d') == col
True
>>> _col2num('AaZ')
727
>>> _col2num('10')
9
>>> _col2num(9)
8

Negatives (from left-end) are preserved:

>>> _col2num('AaZ')
727

Fails ugly:

>>> _col2num('%$')
Traceback (most recent call last):
ValueError: substring not found

>>> _col2num([])
Traceback (most recent call last):
TypeError: int() argument must be a string, a bytes-like object or
            a number, not 'list'
pandalone.xleash._capture._expand_rect(states_matrix, r1, r2, exp_moves)[source]

Applies the expansion-moves based on the states_matrix.

Parameters:
  • state
  • r1 (Coords) – any vertice of the rect to expand
  • r2 (Coords) – any vertice of the rect to expand
  • states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
  • exp_moves – Just the parsed string, and not None.
Returns:

a sorted rect top-left/bottom-right

Examples:

>>> states_matrix = np.array([
...     #0  1  2  3  4  5
...     [0, 0, 0, 0, 0, 0], #0
...     [0, 0, 1, 1, 1, 0], #1
...     [0, 1, 0, 0, 1, 0], #2
...     [0, 1, 1, 1, 1, 0], #3
...     [0, 0, 0, 0, 0, 1], #4
... ], dtype=bool)

>>> r1, r2 = (Coords(2, 1), Coords(2, 1))
>>> _expand_rect(states_matrix, r1, r2, 'U')
(Coords(row=2, col=1), Coords(row=2, col=1))

>>> r1, r2 = (Coords(3, 1), Coords(2, 1))
>>> _expand_rect(states_matrix, r1, r2, 'R')
(Coords(row=2, col=1), Coords(row=3, col=4))

>>> r1, r2 = (Coords(2, 1), Coords(6, 1))
>>> _expand_rect(states_matrix, r1, r2, 'r')
(Coords(row=2, col=1), Coords(row=6, col=5))

>>> r1, r2 = (Coords(2, 3), Coords(2, 3))
>>> _expand_rect(states_matrix, r1, r2, 'LURD')
(Coords(row=1, col=1), Coords(row=3, col=4))
pandalone.xleash._capture._extract_states_vector(states_matrix, dn_coords, land, mov)[source]

Extract a slice from the states-matrix by starting from land and following mov.

pandalone.xleash._capture._resolve_cell(cell, up_coords, dn_coords, base_coords=None)[source]

Translates any special coords to absolute ones.

To get the margin_coords, use one of:

  • ABCSheet.get_margin_coords()
  • io._sheets.margin_coords_from_states_matrix()
Parameters:
  • cell (Cell) – The “A1” cell to translate its coords.
  • up_coords (Coords) – the top-left resolved coords with full-cells
  • dn_coords (Coords) – the bottom-right resolved coords with full-cells
  • base_coords (Coords) – A resolved cell to base dependent coords (.).
Returns:

the resolved cell-coords

Return type:

Coords

Examples:

>>> up = Coords(1, 2)
>>> dn = Coords(10, 6)
>>> base = Coords(40, 50)

>>> _resolve_cell(Cell(col='B', row='5'), up, dn)
Coords(row=4, col=1)

>>> _resolve_cell(Cell('^', '^'), up, dn)
Coords(row=1, col=2)

>>> _resolve_cell(Cell('_', '_'), up, dn)
Coords(row=10, col=6)

>>> base == _resolve_cell(Cell('.', '.'), up, dn, base)
True

>>> _resolve_cell(Cell('-1', '-2'), up, dn)
Coords(row=10, col=5)

>>> _resolve_cell(Cell('A', 'B'), up, dn)
Traceback (most recent call last):
ValueError: invalid cell(Cell(row='A', col='B')) due to:
        invalid row('A') due to: invalid literal for int() with base 10: 'A'

But notice when base-cell missing:

>>> _resolve_cell(Cell('1', '.'), up, dn)
Traceback (most recent call last):
ValueError: invalid cell(Cell(row='1', col='.')) due to:
Cannot resolve `relative-col` without `base-coord`!
pandalone.xleash._capture._resolve_coord(cname, cfunc, coord, up_coord, dn_coord, base_coords=None)[source]

Translates special coords or converts Excel string 1-based rows/cols to zero-based, reporting invalids.

Parameters:
  • cname (str) – the coord-name, one of ‘row’, ‘column’
  • cfunc (function) – the function to convert coord str --> int
  • str coord (int,) – the “A1” coord to translate
  • up_coord (int) – the resolved top or left margin zero-based coordinate
  • dn_coord (int) – the resolved bottom or right margin zero-based coordinate
  • None base_coords (int,) – the resolved basis for dependent coord, if any
Returns:

the resolved coord or None if it were not a special coord.

Row examples:

>>> cname = 'row'

>>> r0 = _resolve_coord(cname, _row2num, '1', 1, 10)
>>> r0
0
>>> r0 == _resolve_coord(cname, _row2num, 1, 1, 10)
True
>>> _resolve_coord(cname, _row2num, '^', 1, 10)
1
>>> _resolve_coord(cname, _row2num, '_', 1, 10)
10
>>> _resolve_coord(cname, _row2num, '.', 1, 10, 13)
13
>>> _resolve_coord(cname, _row2num, '-3', 0, 10)
8

But notice when base-cell missing:

>>> _resolve_coord(cname, _row2num, '.', 0, 10, base_coords=None)
Traceback (most recent call last):
ValueError: Cannot resolve `relative-row` without `base-coord`!

Other ROW error-checks:

>>> _resolve_coord(cname, _row2num, '0', 0, 10)
Traceback (most recent call last):
ValueError: invalid row('0') due to: Uncooked-coord cannot be zero!

>>> _resolve_coord(cname, _row2num, 'a', 0, 10)
Traceback (most recent call last):
ValueError: invalid row('a') due to: invalid literal for int() with base 10: 'a'

>>> _resolve_coord(cname, _row2num, None, 0, 10)
Traceback (most recent call last):
ValueError: invalid row(None) due to:
        int() argument must be a string,
        a bytes-like object or a number, not 'NoneType'

Column examples:

>>> cname = 'column'

>>> _resolve_coord(cname, _col2num, 'A', 1, 10)
0
>>> _resolve_coord(cname, _col2num, 'DADA', 1, 10)
71084
>>> _resolve_coord(cname, _col2num, '.', 1, 10, 13)
13
>>> _resolve_coord(cname, _col2num, '-4', 0, 10)
7

And COLUMN error-checks:

>>> _resolve_coord(cname, _col2num, None, 0, 10)
Traceback (most recent call last):
ValueError: invalid column(None) due to: int() argument must be a string,
            a bytes-like object or a number, not 'NoneType'

>>> _resolve_coord(cname, _col2num, 0, 0, 10)
Traceback (most recent call last):
ValueError: invalid column(0) due to: Uncooked-coord cannot be zero!
pandalone.xleash._capture._row2num(coord)[source]

Resolves special coords or converts Excel 1-based rows to zero-based, reporting invalids.

Parameters:int coord (str,) – excel-row coordinate or one of ^_.
Returns:excel row number, >= 0
Return type:int

Examples:

>>> row = _row2num('1')
>>> row
0
>>> row == _row2num(1)
True

Negatives (from bottom) are preserved:

>>> _row2num('-1')
-1

Fails ugly:

>>> _row2num('.')
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '.'
pandalone.xleash._capture._sort_rect(r1, r2)[source]

Sorts rect-vertices in a 2D-array (with vertices in rows).

Example:

>>> _sort_rect((5, 3), (4, 6))
array([[4, 3],
       [5, 6]])
pandalone.xleash._capture._target_opposite(states_matrix, dn_coords, land, moves, edge_name='')[source]

Follow moves from land and stop on the 1st full-cell.

Parameters:
  • states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
  • dn_coords (Coords) – the bottom-right for the top-left of full-cells
  • land (Coords) – the landing-cell
  • moves (str) – MUST not be empty
Returns:

the identified target-cell’s coordinates

Return type:

Coords

Examples:

>>> states_matrix = np.array([
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 1, 1, 1],
...     [0, 0, 1, 0, 0, 1],
...     [0, 0, 1, 1, 1, 1]
... ])
>>> args = (states_matrix, Coords(4, 5))

>>> _target_opposite(*(args + (Coords(0, 0), 'DR')))
Coords(row=3, col=2)

>>> _target_opposite(*(args + (Coords(0, 0), 'RD')))
Coords(row=2, col=3)

It fails if a non-empty target-cell cannot be found, or it ends-up beyond bounds:

>>> _target_opposite(*(args + (Coords(0, 0), 'D')))
Traceback (most recent call last):
pandalone.xleash._capture.EmptyCaptureException: No opposite-target found
                while moving(D) from landing-Coords(row=0, col=0)!

>>> _target_opposite(*(args + (Coords(0, 0), 'UR')))
Traceback (most recent call last):
pandalone.xleash._capture.EmptyCaptureException: No opposite-target found
                while moving(UR) from landing-Coords(row=0, col=0)!

But notice that the landing-cell maybe outside of bounds:

>>> _target_opposite(*(args + (Coords(3, 10), 'L')))
Coords(row=3, col=5)
pandalone.xleash._capture._target_same(states_matrix, dn_coords, land, moves, edge_name='')[source]

Scan term:exterior row and column on specified moves and stop on the last full-cell.

Parameters:
  • states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
  • dn_coords (Coords) – the bottom-right for the top-left of full-cells
  • land (Coords) – the landing-cell which MUST be within bounds
  • moves – which MUST not be empty
Returns:

the identified target-cell’s coordinates

Return type:

Coords

Examples:

>>> states_matrix = np.array([
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 1, 1, 1],
...     [0, 0, 1, 0, 0, 1],
...     [0, 0, 1, 1, 1, 1]
... ])
>>> args = (states_matrix, Coords(4, 5))

>>> _target_same(*(args + (Coords(4, 5), 'U')))
Coords(row=2, col=5)

>>> _target_same(*(args + (Coords(4, 5), 'L')))
Coords(row=4, col=2)

>>> _target_same(*(args + (Coords(4, 5), 'UL', )))
Coords(row=2, col=2)

It fails if landing is empty or beyond bounds:

>>> _target_same(*(args + (Coords(2, 2), 'DR')))
Traceback (most recent call last):
pandalone.xleash._capture.EmptyCaptureException: No same-target found
                while moving(DR) from landing-Coords(row=2, col=2)!

>>> _target_same(*(args + (Coords(10, 3), 'U')))
Traceback (most recent call last):
pandalone.xleash._capture.EmptyCaptureException: No same-target found
                while moving(U) from landing-Coords(row=10, col=3)!
pandalone.xleash._capture._target_same_vector(states_matrix, dn_coords, land, mov)[source]
Parameters:
  • states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
  • dn_coords (Coords) – the bottom-right for the top-left of full-cells
  • land (Coords) – The landing-cell, which MUST be full!
pandalone.xleash._capture.coords2Cell(row, col)[source]

Make A1 Cell from resolved coords, with rudimentary error-checking.

Examples:

>>> coords2Cell(row=0, col=0)
Cell(row='1', col='A')
>>> coords2Cell(row=0, col=26)
Cell(row='1', col='AA')

>>> coords2Cell(row=10, col='.')
Cell(row='11', col='.')

>>> coords2Cell(row=-3, col=-2)
Traceback (most recent call last):
AssertionError: negative row!
pandalone.xleash._capture.resolve_capture_rect(states_matrix, up_dn_margins, st_edge, nd_edge=None, exp_moves=None, base_coords=None)[source]

Performs targeting, capturing and expansions based on the states-matrix.

To get the margin_coords, use one of:

  • ABCSheet.get_margin_coords()
  • io._sheets.margin_coords_from_states_matrix()

Its results can be fed into read_capture_values().

Parameters:
  • states_matrix (np.ndarray) – A 2D-array with False wherever cell are blank or empty. Use ABCSheet.get_states_matrix() to derrive it.
  • Coords) up_dn_margins ((Coords,) – the top-left/bottom-right coords with full-cells
  • st_edge (Edge) – “uncooked” as matched by regex
  • nd_edge (Edge) – “uncooked” as matched by regex
  • or none exp_moves (list) – Just the parsed string, and not None.
  • base_coords (Coords) – The base for a dependent 1st edge.
Returns:

a (Coords, Coords) with the 1st and 2nd capture-cell ordered from top-left –> bottom-right.

Return type:

tuple

Raises:

EmptyCaptureException – When targeting failed, and no target cell identified.

Examples::
>>> from pandalone.xleash import Edge, margin_coords_from_states_matrix
>>> states_matrix = np.array([
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 0, 0, 0],
...     [0, 0, 0, 1, 1, 1],
...     [0, 0, 1, 0, 0, 1],
...     [0, 0, 1, 1, 1, 1]
... ], dtype=bool)
>>> up, dn = margin_coords_from_states_matrix(states_matrix)
>>> st_edge = Edge(Cell('1', 'A'), 'DR')
>>> nd_edge = Edge(Cell('.', '.'), 'DR')
>>> resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
(Coords(row=3, col=2), Coords(row=4, col=2))

Using dependenent coordinates for the 2nd edge:

>>> st_edge = Edge(Cell('_', '_'), None)
>>> nd_edge = Edge(Cell('.', '.'), 'UL')
>>> rect = resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
>>> rect
(Coords(row=2, col=2), Coords(row=4, col=5))

Using sheet’s margins:

>>> st_edge = Edge(Cell('^', '_'), None)
>>> nd_edge = Edge(Cell('_', '^'), None)
>>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
True

Walking backwards:

>>> st_edge = Edge(Cell('^', '_'), 'L')          # Landing is full, so 'L' ignored.
>>> nd_edge = Edge(Cell('_', '_'), 'L', '+')    # '+' or would also stop.
>>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge)
True

5.1.12. Submodule: pandalone.xleash._filter

The high-level functionality, the filtering and recursive lassoing.

Prefer accessing the public members from the parent module.

class pandalone.xleash._filter.ASTInterpreter(symtable=None, usersyms=None, writer=None, err_writer=None, use_numpy=True, minimal=False, no_if=False, no_for=False, no_while=False, no_try=False, no_functiondef=False, no_ifexp=False, no_listcomp=False, no_augassign=False, no_assert=False, no_delete=False, no_raise=False, no_print=False, max_time=30)[source]

Bases: asteval.asteval.Interpreter

class pandalone.xleash._filter.XLocation(sheet, st, nd, base_coords)

Bases: tuple

Fields denoting the position of a sheet/cell while running a element-wise-filter.

Practically func:run_filter_elementwise() preserves these fields if the processed ones were `None.

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, sheet, st, nd, base_coords)

Create new instance of XLocation(sheet, st, nd, base_coords)

__repr__()

Return a nicely formatted representation string

_asdict()

Return a new OrderedDict which maps field names to their values.

classmethod _make(iterable, new=<built-in method __new__ of type object at 0xa385c0>, len=<built-in function len>)

Make a new XLocation object from a sequence or iterable

_replace(**kwds)

Return a new XLocation object replacing specified fields with new values

base_coords

Alias for field number 3

nd

Alias for field number 2

sheet

Alias for field number 0

st

Alias for field number 1

pandalone.xleash._filter._classify_rect_shape(st, nd)[source]

Identifies rect from its edge-coordinates (row, col, 2d-table)..

Parameters:
  • st (Coords) – the top-left edge of capture-rect, inclusive
  • or None nd (Coords) – the bottom-right edge of capture-rect, inclusive
Returns:

in int based on the input like that:

  • 0: only st given
  • 1: st and nd point the same cell
  • 2: row
  • 3: col
  • 4: 2d-table

Examples:

>>> _classify_rect_shape((1,1), None)
0
>>> _classify_rect_shape((2,2), (2,2))
1
>>> _classify_rect_shape((2,2), (2,20))
2
>>> _classify_rect_shape((2,2), (20,2))
3
>>> _classify_rect_shape((2,2), (20,20))
4
pandalone.xleash._filter._downdim(values, new_ndim)[source]

Squeeze it, and then flatten it, before inflating it.

Parameters:
  • values – The scalar ot 2D-results of Sheet.read_rect()
  • new_dim (int) – The new dimension the result should have
pandalone.xleash._filter._redim(values, new_ndim)[source]

Reshapes the capture-rect values of read_capture_rect().

Parameters:
  • values ((nested) list, *) – The scalar ot 2D-results of Sheet.read_rect()
  • new_ndim
Returns:

reshaped values

Return type:

list of lists, list, *

Examples:

>>> _redim([1, 2], 2)
[[1, 2]]

>>> _redim([[1, 2]], 1)
[1, 2]

>>> _redim([], 2)
[[]]

>>> _redim([[3.14]], 0)
3.14

>>> _redim([[11, 22]], 0)
[11, 22]

>>> arr = [[[11], [22]]]
>>> arr == _redim(arr, None)
True

>>> _redim([[11, 22]], 0)
[11, 22]
pandalone.xleash._filter._updim(values, new_ndim)[source]

Append trivial dimensions to the left.

Parameters:
  • values – The scalar ot 2D-results of Sheet.read_rect()
  • new_dim (int) – The new dimension the result should have
pandalone.xleash._filter.install_default_filters(filters_dict)[source]

Updates the default available filters used by lasso() when constructing its internal Ranger.

param dict filters_dict:
 The dictionary to update with the default filters.
pandalone.xleash._filter.pipe_filter(ranger, lasso, *filters, **kwds)[source]

A bulk-filter that applies all call-specifiers one after another on the capture-rect values.

Parameters:filters (list) – the json-parsed call-spec
pandalone.xleash._filter.py_filter(ranger, lasso, expr)[source]

A bulk-filter that passes values through a python-expression using asteval library.

The expr may access read-write all locals() of this method (ranger, lasso), the numpy funcs, and the pandalone.xleash module under the xleash variable.

The expr may return either:
  • the processed values, or
  • an instance of the Lasso, in which case only its opt field is checked and replaced with original if missing. So better use namedtuple._replace() on the current lasso which exists in the expr’s namespace.
Parameters:expr (str) – The python-expression, which may comprise of multiple statements.
pandalone.xleash._filter.pyeval_filter(ranger, lasso, filters=(), eval_all=False, include=None, exclude=None, depth=-1)[source]

A element-wise-filter that uses asteval to evaluate string values as python expressions.

The expr fecthed from term:`capturing may access read-write all locals() of this method (ie: ranger, lasso), the numpy funcs, and the pandalone.xleash module under the xleash variable.

The expr may return either:
  • the processed values, or
  • an instance of the Lasso, in which case only its opt field is checked and replaced with original if missing. So better use namedtuple._replace() on the current lasso which exists in the expr’s namespace.
Parameters:
  • eval_all (bool) – If True raise on 1st error and stop diving cells. Defaults to False.
  • filters (list) – Any filters to apply after invoking the element_func.
  • or str include (list) – Items to include when diving into “indexed” values. See run_filter_elementwise().
  • or str exclude (list) – Items to exclude when diving into “indexed” values. See run_filter_elementwise().
  • or None depth (int) – How deep to dive into nested structures, “indexed” or lists. If < 0, no limit. If 0, stops completely. See run_filter_elementwise().

Example:

>>> expr = '''
... res = array([[0.5, 0.3, 0.1, 0.1]])
... res * res.T
... '''
>>> lasso = Lasso(values=expr, opts={})
>>> ranger = Ranger(None)
>>> eval_filter(ranger, lasso).values
array([[ 0.25,  0.15,  0.05,  0.05],
       [ 0.15,  0.09,  0.03,  0.03],
       [ 0.05,  0.03,  0.01,  0.01],
       [ 0.05,  0.03,  0.01,  0.01]])
pandalone.xleash._filter.recursive_filter(ranger, lasso, filters=(), include=None, exclude=None, depth=-1)[source]

A element-wise-filter that expand recursively any xl-ref strings elements in capture-rect values.

Parameters:
pandalone.xleash._filter.redim_filter(ranger, lasso, scalar=None, cell=None, row=None, col=None, table=None)[source]

A bulk-filter that reshapes sand/or transpose captured values, depending on rect’s shape.

Each dimension might be a single int or None, or a pair [dim, transpose].

pandalone.xleash._filter.run_filter_elementwise(ranger, lasso, element_func, filters, include=None, exclude=None, depth=-1, *args, **kwds)[source]

Runner of all element-wise filters.

It applies the element_func on elements extracted from lasso.values by treating the later first as “indexed” objects (Mappings, Series and Dataframes.), and if that fails, as nested lists.

  • The include/exclude filter args work only for “indexed” objects with items() or iteritems() and indexing methods.

    • If no filter arg specified, expands for all keys.
    • If only include specified, rejects all keys not explicitly contained in this filter arg.
    • If only exclude specified, expands all keys not explicitly contained in this filter arg.
    • When both include/exclude exist, only those explicitly included are accepted, unless also excluded.
  • Lower the logging level to see other than syntax-errors on recursion reported on log.

  • Only those in XLocation are passed recursively.

Parameters:
  • element_func (list) –

    A function implementing the element-wise filter and returning a 2-tuple (is_proccessed, new_val_or_lasso), like that:

    def element_func(ranger, lasso, context, elval)
        proced = False
        try:
            elval = int(elval)
            proced = True
        except ValueError:
            pass
        return proced, elval
    

    Its kwds may contain the include, exclude and depth args. Any exception raised from element_func will cancel the diving.

  • filters (list) – Any filters to apply after invoking the element_func.
  • or str include (list) – Items to include when diving into “indexed” values. See description above.
  • or str exclude (list) – Items to exclude when diving into “indexed” values. See description above.
  • or None depth (int) – How deep to dive into nested structures, “indexed” or lists. If < 0, no limit. If 0, stops completely.
Params args:

To be relayed to ‘element_func’.

Params kwds:

To be relayed to ‘element_func’.

pandalone.xleash._filter.xlwings_dims_call_spec()[source]

A list call-spec for _redim_filter() filter that imitates results of xlwings library.

5.1.13. Submodule: pandalone.xleash._lasso

The high-level functionality, the filtering and recursive lassoing.

Prefer accessing the public members from the parent module.

class pandalone.xleash._lasso.Ranger(sheets_factory, base_opts=None, available_filters=None)[source]

Bases: object

The director-class that performs all stages required for “throwing the lasso” around rect-values.

Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.

The do_lasso() does the job.

Variables:
  • sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not close it in the end. Maybe None, but do_lasso() will scream unless invoked with a context_lasso arg containing a concrete ABCSheet.
  • base_opts (dict) – The opts that are deep-copied and used as the defaults for every do_lasso(), whether invoked directly or recursively by recursive_filter(). If unspecified, no opts are used, but this attr is set to an empty dict. See get_default_opts().
  • or None available_filters (dict) – The filters available for a xl-ref to use. If None, then uses xleash.installed_filters. Use an empty dict not to use any filters.
  • intermediate_lasso (Lasso) – A ('stage', Lasso) pair with the last Lasso instance produced during the last execution of the do_lasso(). Used for inspecting/debuging.
__init__(sheets_factory, base_opts=None, available_filters=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_make_init_Lasso(**context_kwds)[source]

Creates the lasso to be used for each new do_lasso() invocation.

_parse_and_merge_with_context(xlref, init_lasso)[source]

Merges xl-ref parsed-parsed_fields with init_lasso, reporting any errors.

Parameters:init_lasso (Lasso) – Default values to be overridden by non-nulls.
Returns:a Lasso with any non None parsed-fields updated
_relasso(lasso, stage, **kwds)[source]

Replace lasso-values and updated intermediate_lasso.

_resolve_capture_rect(lasso, sheet)[source]

Also handles EmptyCaptureException in case opts['no_empty'] != False.

do_lasso(xlref, **context_kwds)[source]

The director-method that does all the job of hrowing a lasso around spreadsheet’s rect-regions according to xl-ref.

Parameters:
  • xlref (str) –

    a string with the xl-ref format:

    <url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
    

    i.e.:

    file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
    
  • context_kwds (Lasso) – Default Lasso fields in case parsed ones are None
Returns:

The final Lasso with captured & filtered values.

Return type:

Lasso

make_call(lasso, func_name, args, kwds)[source]

Executes a call-spec respecting any lax argument popped from kwds.

Parameters:lax (bool) – After overlaying it on opts, it governs whether to raise on errors. Defaults to False (scream!).
pandalone.xleash._lasso.get_default_opts(overrides=None)[source]

Default opts used by lasso() when constructing its internal Ranger.

Parameters:or None overrides (dict) – Any items to update the default ones.
pandalone.xleash._lasso.lasso(xlref, sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds)[source]

High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a Ranger .

Parameters:
  • xlref (str) –

    a string with the xl-ref format:

    <url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
    

    i.e.:

    file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
    
  • sheets_factory – Factory of sheets from where to parse rect-values; if unspecified, the new SheetsFactory created is closed afterwards. Delegated to make_default_Ranger(), so items override default ones; use a new Ranger if that is not desired.
  • available_filters (dict or None) – Delegated to make_default_Ranger(), so items override default ones; use a new Ranger if that is not desired.
  • return_lasso (bool) –

    If True, values are contained in the returned Lasso instance, along with all other artifacts of the lassoing procedure.

    For more debugging help, create a Range yourself and inspect the Ranger.intermediate_lasso.

  • context_kwds (Lasso) – Default Lasso fields in case parsed ones are None (i.e. you can specify the sheet like that).
Variables:

base_opts – Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every Ranger.do_lasso(), whether invoked directly or recursively by recursive_filter(). Read the code to be sure what are the available choices. Delegated to make_default_Ranger(), so items override default ones; use a new Ranger if that is not desired.

Returns:

Either the captured & filtered values or the final Lasso, depending on the return_lassos arg.

Example:

sheet = _
pandalone.xleash._lasso.make_default_Ranger(sheets_factory=None, base_opts=None, available_filters=None)[source]

Makes a defaulted Ranger.

Parameters:
  • sheets_factory – Factory of sheets from where to parse rect-values; if unspecified, a new SheetsFactory is created. Remember to invoke its SheetsFactory.close() to clear resources from any opened sheets.
  • base_opts (dict or None) –

    Default opts to affect the lassoing, to be merged with defaults; uses get_default_opts().

    Read the code to be sure what are the available choices :-(.

  • available_filters (dict or None) – The filters available for a xl-ref to use. (xleash.installed_filters used if unspecified).

For instance, to make you own sheets-factory and override options, yoummay do this:

>>> from pandalone import xleash

>>> with xleash.SheetsFactory() as sf:
...     xleash.make_default_Ranger(sf, base_opts={'lax': True})
<pandalone.xleash._lasso.Ranger object at
...

5.2. Module: pandalone.mappings

Hierarchical string-like objects useful for indexing, that can be rename/relocated at a later stage.

Pstep Automagically-constructed relocatable paths for accessing data-tree.
pmods_from_tuples(pmods_tuples) Turns a list of 2-tuples into a pmods hierarchy.
Pmod([_alias, _steps, _regxs]) A path-step mapping forming the pmods-hierarchy.

Example:

>>> from pandalone.mappings import pmods_from_tuples

>>> pmods = pmods_from_tuples([
...     ('',         'deeper/ROOT'),
...     ('/abc',     'ABC'),
...     ('/abc/foo', 'BAR'),
... ])
>>> p = pmods.step()
>>> p.abc.foo
`BAR`
>>> p._paths()
['deeper/ROOT/ABC/BAR']
  • TODO: Implements “anywhere” pmods(//).
class pandalone.mappings.Pmod(_alias=None, _steps={}, _regxs={})[source]

Bases: object

A path-step mapping forming the pmods-hierarchy.

  • The pmods denotes the hierarchy of all mappings, that either rename or relocate path-steps.

  • A single mapping transforms an “origin” path to a “destination” one (also called as “from” and “to” paths).

  • A mapping always transforms the final path-step, like that:

    FROM_PATH       TO_PATH       RESULT_PATH
    ---------       -------       -----------
    /rename/path    foo       --> /rename/foo        ## renaming
    /relocate/path  foo/bar   --> /relocate/foo/bar  ## relocation
    ''              a/b/c     --> /a/b/c             ## Relocate all paths.
    /               a/b/c     --> /a/b/c             ## Relocates 1st "empty-str" step.
    
  • The pmod is the mapping of that single path-step.

  • It is possible to match fully on path-steps using regular-expressions, and then to use any captured-groups from the final step into the mapped value:

    (/all(.*)/path, foo)   + all_1/path --> /all_1/foo
                           + all_XYZ    --> /all_XYZ        ## no change
    (/all(.*)/path, foo\1) + all_1/path --> /all_1/foo_1
    

    If more than one regex match, they are merged in the order declared (the latest one overrides a previous one).

  • Any exact child-name matches are applied and merged after regexs.

  • Use pmods_from_tuples() to construct the pmods-hierarchy.

  • The pmods are used internally by class:Pstep to correspond the component-paths of their input & output onto the actual value-tree paths.

Variables:
  • _alias (str) – (optional) the mapped-name of the pstep for this pmod
  • _steps (dict) – {original_name –> pmod}
  • _regxs (OrderedDict) – {regex_on_originals –> pmod}

Example:

Note

Do not manually construct instances from this class! To construct a hierarchy use the pmods_from_tuples().

You can either use it for massively map paths, either for renaming them:

>>> pmods = pmods_from_tuples([
...         ('/a',           'A'),
...         ('/~b.*',        r'BB\g<0>'),  ## Previous match.
...         ('/~b.*/~c.(.*)', r'W\1ER'),    ## Capturing-group(1)
... ])
>>> pmods.map_paths(['/a', '/a/foo'])     ## 1st rule
['/A', '/A/foo']

>>> pmods.map_path('/big/stuff')          ## 2nd rule
'/BBbig/stuff'

>>> pmods.map_path('/born/child')         ## 2nd & 3rd rule
'/BBborn/WildER'

or to relocate them:

>>> pmods = pmods_from_tuples([
...         ('/a',           'A/AA'),
...         ('/~b.*/~c(.*)',  r'../C/\1'),
...         ('/~b.*/~.*/~r.*', r'/\g<0>'),
... ])
>>> pmods.map_paths(['/a/foo', '/big/child', '/begin/from/root'])
['/A/AA/foo', '/big/C/hild', '/root']

Here is how you relocate “root” (notice that the '' path is the root):

>>> pmods = pmods_from_tuples([('', '/NEW/ROOT')])
>>> pmods.map_paths(['/a/foo', ''])
['/NEW/ROOT/a/foo', '/NEW/ROOT']
__eq__(o)[source]

Return self==value.

__init__(_alias=None, _steps={}, _regxs={})[source]

Args passed only for testing, remember _regxs to be (k,v) tuple-list!

Note

Volatile arg-defaults (empty dicts) are knowingly used , to preserve memory; should never append in them!

__repr__()[source]

Return repr(self).

_append_into_regxs(key)[source]

Inserts a child-mappings into _steps dict.

Parameters:key (str) – the regex-pattern to add
_append_into_steps(key)[source]

Inserts a child-mappings into _steps dict.

Parameters:key (str) – the step-name to add
_match_regxs(cstep)[source]

Return (pmod, regex.match) for those child-pmods matching cstep.

_merge(other)[source]

Clone and override all its props with props from other-pmod, recursively.

Although it does not modify this, the other or their children pmods, it may “share” (crosslink) them, so pmods MUST NOT be modified later.

Parameters:other (Pmod) – contains the dicts with the overrides
Returns:the cloned merged pmod
Return type:Pmod

Examples:

Look how _steps are merged:

>>> pm1 = Pmod(_alias='pm1', _steps={
...     'a':Pmod(_alias='A'), 'c':Pmod(_alias='C')})
>>> pm2 = Pmod(_alias='pm2', _steps={
...     'b':Pmod(_alias='B'), 'a':Pmod(_alias='AA')})
>>> pm = pm1._merge(pm2)
>>> sorted(pm._steps.keys())
['a', 'b', 'c']

And here it is _regxs merging, which preserves order:

>>> pm1 = Pmod(_alias='pm1',
...            _regxs=[('d', Pmod(_alias='D')),
...                    ('a', Pmod(_alias='A')),
...                    ('c', Pmod(_alias='C'))])
>>> pm2 = Pmod(_alias='pm2',
...            _regxs=[('b', Pmod(_alias='BB')),
...                    ('a', Pmod(_alias='AA'))])

>>> pm1._merge(pm2)
pmod('pm2', OrderedDict([(re.compile('d'), pmod('D')),
           (re.compile('c'), pmod('C')),
           (re.compile('b'), pmod('BB')),
           (re.compile('a'), pmod('AA'))]))

>>> pm2._merge(pm1)
pmod('pm1', OrderedDict([(re.compile('b'), pmod('BB')),
            (re.compile('d'), pmod('D')),
            (re.compile('a'), pmod('A')),
            (re.compile('c'), pmod('C'))]))
_override_regxs(other)[source]

Override this pmod’s _regxs dict with other’s, recursively.

  • It may “share” (crosslink) the dict and/or its child-pmods between the two pmod args (self and other).
  • No dict is modified (apart from self, which must have been cloned previously by Pmod._merge()), to avoid side-effects in case they were “shared”.
  • It preserves dict-ordering so that other order takes precedence (its elements are the last ones).
Parameters:
  • self (Pmod) – contains the dict that would be overridden
  • other (Pmod) – contains the dict with the overrides
_override_steps(other)[source]

Override this pmod’s ‘_steps’ dict with other’s, recursively.

Same as _override_regxs() but without caring for order.

alias(cstep)[source]

Like descend() but without merging child-pmods.

Returns:the expanded alias from child/regexs or None
descend(cstep)[source]

Return child-pmod with merged any exact child with all matched regexps, along with its alias regex-expaned.

Parameters:cstep (str) – the child path-step cstep of the pmod to return
Returns:the merged-child pmod, along with the alias; both might be None, if nothing matched, or no alias.
Return type:tuple(Pmod, str)

Example:

>>> pm = Pmod(
...     _steps={'a': Pmod(_alias='A')},
...     _regxs=[('a\w*', Pmod(_alias='AWord')),
...              ('a(\d*)', Pmod(_alias=r'A_\1')),
...    ])
>>> pm.descend('a')
(pmod('A'), 'A')

>>> pm.descend('abc')
(pmod('AWord'), 'AWord')

>>> pm.descend('a12')
(pmod('A_\\1'), 'A_12')

>>> pm.descend('BAD')
(None, None)

Notice how children of regexps are merged together:

>>> pm = Pmod(
...     _steps={'a':
...        Pmod(_alias='A', _steps={1: 11})},
...     _regxs=[
...        (r'a\w*', Pmod(_alias='AWord',
...                      _steps={2: Pmod(_alias=22)})),
...        (r'a\d*', Pmod(_alias='ADigit',
...                     _steps={3: Pmod(_alias=33)})),
...    ])
>>> sorted(pm.descend('a')[0]._steps)    ## All children and regexps match.
[1, 2, 3]

>>> pm.descend('aa')[0]._steps           ## Only 'a\w*' matches.
{2: pmod(22)}

>>> sorted(pm.descend('a1')[0]._steps )  ## Both regexps matches.
[2, 3]

So it is possible to say:

>>> pm.descend('a1')[0].alias(2)
22
>>> pm.descend('a1')[0].alias(3)
33
>>> pm.descend('a1')[0].descend('BAD')
(None, None)
>>> pm.descend('a$')
(None, None)

but it is better to use map_path() for this.

map_path(path)[source]

Maps a ‘/rooted/path’ using all aliases while descending its child pmods.

It uses any aliases on all child pmods if found.

Parameters:path (str) – a rooted path to transform
Returns:the rooted mapped path or ‘/’ if path was ‘/’
Return type:str or None

Examples:

>>> pmods = pmods_from_tuples([
...         ('/a',              'A/AA'),
...         ('/~a(\\w*)',       r'BB\1'),
...         ('/~a\\w*/~d.*',     r'D \g<0>'),
...         ('/~a(\\d+)',       r'C/\1'),
...         ('/~a(\\d+)/~(c.*)', r'CC-/\1'), # The 1st group is ignored!
...         ('/~a\\d+/~e.*',     r'/newroot/\g<0>'), # Rooted mapping.
... ])

>>> pmods.map_path('/a')
'/A/AA'

>>> pmods.map_path('/a_hi')
'/BB_hi'

>>> pmods.map_path('/a12')
'/C/12'

>>> pmods.map_path('/a12/etc')
'/newroot/etc'

Notice how children from all matching prior-steps are merged:

>>> pmods.map_path('/a12/dow')
'/C/12/D dow'
>>> pmods.map_path('/a12/cow')
'/C/12/CC-/cow'

To map root use ‘’ which matches before the 1st slash(‘/’):

>>> pmods = pmods_from_tuples([('', 'New/Root'),])  ## Relative
>>> pmods
pmod({'': pmod('New/Root')})

>>> pmods.map_path('/for/plant')
'New/Root/for/plant'

>>> pmods_from_tuples([('', '/New/Root'),]).map_path('/for/plant')
'/New/Root/for/plant'

Note

Using slash(‘/’) for “from” path will NOT map root:

>>> pmods = pmods_from_tuples([('/', 'New/Root'),])
>>> pmods
pmod({'': pmod({'': pmod('New/Root')})})

>>> pmods.map_path('/for/plant')
'/for/plant'

>>> pmods.map_path('//for/plant')
'/New/Root/for/plant'

'/root'

but ‘’ always remains unchanged (whole document):

>>> pmods.map_path('')
''
step(pname='', alias=None)[source]

Create a new Pstep having as mappings this pmod.

If no pname specified, creates a root pstep.

Delegates to Pstep.__new__().

class pandalone.mappings.Pstep[source]

Bases: str

Automagically-constructed relocatable paths for accessing data-tree.

The “magic” autocreates psteps as they referenced, making writing code that access data-tree paths, natural, while at the same time the “model” of those tree-data gets discovered.

Each pstep keeps internally the name of a data-tree step, which, when created through recursive referencing, concedes with parent’s branch leading to this step. That name can be modified with Pmod so the same data-accessing code can refer to differently-named values int the data-tree.

Variables:
  • _csteps (dict) – the child-psteps by their name (default None)
  • _pmod (dict) – path-modifications used to construct this and relayed to children (default None)
  • _locked (int) – one of - Pstep.CAN_RELOCATE (default), - Pstep.CAN_RENAME, - Pstep.LOCKED (neither from the above).
  • _tags (set) – A set of strings (default ())
  • _schema (dict) – json-schema data.

See __new__() for interal constructor.

Usage:

  • Use a Pmod.pstep() to construct a root pstep from mappings. Specify a string argument to construct a relative pstep-hierarchy.

  • Just referencing (non_private) attributes, creates them.

  • Private attributes and functions (starting with _) exist for specific operations (ie for specifying json-schema, or for collection all paths).

  • Assignments are only allowed for string-values, or to private attributes:

    >>> p = Pstep()
    >>> p.assignments = 12
    Traceback (most recent call last):
    AssertionError: Cannot assign '12' to '/assignments!
    
    >>> p._but_hidden = 'Ok'
    
  • Use _paths() to get all defined paths so far.

  • Construction:

    >>> Pstep()
    ``
    >>> Pstep('a')
    `a`
    

    Notice that pstesps are surrounded with the back-tick char(‘`’).

  • Paths are created implicitely as they are referenced:

    >>> m = {'a': 1, 'abc': 2, 'cc': 33}
    >>> p = Pstep('a')
    >>> assert m[p] == 1
    >>> assert m[p.abc] == 2
    >>> assert m[p.a321.cc] == 33
    
    >>> sorted(p._paths())
    ['a/a321/cc', 'a/abc']
    
  • Any “path-mappings” or “pmods” maybe specified during construction:

    >>> from pandalone.mappings import pmods_from_tuples
    
    >>> pmods = pmods_from_tuples([
    ...     ('',         'deeper/ROOT'),
    ...     ('/abc',     'ABC'),
    ...     ('/abc/foo', 'BAR'),
    ... ])
    >>> p = pmods.step()
    >>> p.abc.foo
    `BAR`
    >>> p._paths()
    ['deeper/ROOT/ABC/BAR']
    
  • but exceptions are thrown if mapping any step marked as “locked”:

    >>> p.abc.foo._locked  ## 3: CAN_RELOCATE
    3
    
    >>> p.abc.foo._lock    ## Screams, because `foo` is already mapped.
    Traceback (most recent call last):
    ValueError: Cannot rename/relocate 'foo'-->'BAR' due to LOCKED!
    
  • Warning

    Creating an empty('') step in some paths will “root” the path:

    >>> p = Pstep()
    >>> _ = p.a1.b
    >>> _ = p.A2
    >>> p._paths()
    ['/A2', '/a1/b']
    
    >>> _ = p.a1.a2.c
    >>> _ = p.a1.a2 = ''
    >>> p._paths()
    ['/A2', '/a1/b', '/c']
    
__dir__() → list[source]

default dir() implementation

static __new__(cls, pname='', _proto_or_pmod=None, alias=None)[source]

Constructs a string with str-content which may comes from the mappings.

These are the valid argument combinations:

pname='attr_name',
pname='attr_name', _alias='Mass [kg]'

pname='attr_name', _proto_or_pmod=Pmod

pname='attr_name', _proto_or_pmod=Pstep
pname='attr_name', _proto_or_pmod=Pstep, _alias='Mass [kg]'
Parameters:
  • pname (str) – this pstep’s name which must coincede with the name of the parent-pstep’s attribute holding this pstep. It is stored at _orig and if no alias and unmapped by pmod, this becomes the alias.
  • or Pstep _proto_or_pmod (Pmod) –

    It can be either:

    • the mappings for this pstep,
    • another pstep to clone attributes from (used when replacing an existing child-pstep), or
    • None.

    The mappings will apply only if Pmod.descend() match pname and will derrive the alias.

  • alias (str) – Will become the super-str object when no mappaings specified (_proto_or_pmod is a dict from some prototype pstep) It gets jsonpointer-escaped if it exists (see pandata.escape_jsonpointer_part())
__repr__()[source]

Return repr(self).

__setattr__(attr, value)[source]

Implement setattr(self, name, value).

_derrive_map_tuples()[source]

Recursively extract (cmap --> alias) pairs from the pstep-hierarchy.

Parameters:
  • pairs (list) – Where to append subtree-paths built.
  • prefix_steps (tuple) – branch currently visiting
Return type:

[(str, str)]

_fix

Sets locked = CAN_RENAME. :return: self :raise: ValueError if step has been relocated pstep

_iter_hierarchy(prefix_steps=())[source]

Breadth-first traversing of pstep-hierarchy.

Parameters:prefix_steps (tuple) – Builds here branch currently visiting.
Returns:yields the visited pstep along with its path (including it)
Return type:(Pstep, [Pstep])
_lock

Set locked = LOCKED. :return: self, for chained use :raise: ValueError if step has been renamed/relocated pstep

_locked

Gets _locked internal flag or scream on set, when step already renamed/relocated

Prefer using one of _fix or _lock instead.

Parameters:locked – One of CAN_RELOCATE, CAN_RENAME, LOCKED.
Raise:ValueError when stricter lock-value on a renamed/relocated pstep
_paths(with_orig=False, tag=None)[source]

Return all children-paths (str-list) constructed so far, in a list.

Parameters:
  • with_orig (bool) – wheter to include also orig-path, for debug.
  • tag (str) – If not ‘None’, fetches all paths with tag in their last step.
Return type:

[str]

Examples:

>>> p = Pstep()
>>> _ = p.a1._tag('inp').b._tag('inp').c
>>> _ = p.a2.b2

>>> p._paths()
['/a1/b/c', '/a2/b2']

>>> p._paths(tag='inp')
['/a1', '/a1/b']

For debugging set with_orig to True:

>>> pmods = pmods_from_tuples([
...     ('',         'ROOT'),
...     ('/a',     'A/AA'),
... ])
>>> p = pmods.step()
>>> _ = p.a.b
>>> p._paths(with_orig=True)
 ['(-->ROOT)/(a-->A/AA)/b']
_schema

Updates json-schema-v4 on this pstep (see JSchema).

_schema_exists()[source]

Always use this to avoid needless schema-instantiations.

_tag(tag)[source]

Add a “tag” for this pstep.

Returns:self, for chained use
_tag_remove(tag)[source]

Delete a “tag” from this pstep.

Returns:self, for chained use
pandalone.mappings._append_step(steps, step)[source]

Joins step at the right of steps, respecting ‘/’, ‘..’, ‘.’, ‘’.

Parameters:
  • steps (tuple) – where to append into (“absolute” when 1st-element is ‘’)
  • step (str) – what to append (may be: 'foo', '.', '..', '')
Return type:

tuple

Note

The empty-string(‘’) is the “root” for both steps and step. An empty-tuple steps is considered “relative”, equivalent to dot().

Example:

>>> _append_step((), 'a')
('a',)

>>> _append_step(('a', 'b'), '..')
('a',)

>>> _append_step(('a', 'b'), '.')
('a', 'b')

Not that an “absolute” path has the 1st-step empty(''), (so the previous paths above were all “relative”):

>>> _append_step(('a', 'b'), '')
('',)

>>> _append_step(('',), '')
('',)

>>> _append_step((), '')
('',)

Dot-dots preserve “relative” and “absolute” paths, respectively, and hence do not coalesce when at the left:

>>> _append_step(('',), '..')
('',)

>>> _append_step(('',), '.')
('',)

>>> _append_step(('a',), '..')
()

>>> _append_step((), '..')
('..',)

>>> _append_step(('..',), '..')
('..', '..')

>>> _append_step((), '.')
()

Single-dots(‘.’) just dissappear:

>>> _append_step(('.',), '.')
()

>>> _append_step(('.',), '..')
('..',)
pandalone.mappings._clone_attrs(obj)[source]

Clone deeply any collection attributes of the passed-in object.

pandalone.mappings._forbidden_pstep_attrs = ('get_values', 'Series')

Psteps attributes excluded from magic-creation, because searched by pandas’s indexing code.

pandalone.mappings._join_paths(*steps)[source]

Joins all path-steps in a single string, respecting '/', '..', '.', ''.

Parameters:steps (str) – single json-steps, from left to right
Return type:str

Note

If you use iter_jsonpointer_parts_relaxed() to generate path-steps, the “root” is signified by the empty('') step; not the slash(/)!

Hence a lone slash(/) gets splitted to an empty step after “root” like that: ('', ''), which generates just “root”('').

Therefore a “folder” (i.e. some/folder/) when splitted equals ('some', 'folder', ''), which results again in the “root”('')!

Examples:

>>> _join_paths('r', 'a', 'b')
'r/a/b'

>>> _join_paths('', 'a', 'b', '..', 'bb', 'cc')
'/a/bb/cc'

>>> _join_paths('a', 'b', '.', 'c')
'a/b/c'

An empty-step “roots” the remaining path-steps:

>>> _join_paths('a', 'b', '', 'r', 'aa', 'bb')
'/r/aa/bb'

All steps have to be already “splitted”:

>>> _join_paths('a', 'b', '../bb')
'a/b/../bb'

Dot-doting preserves “relative” and “absolute” paths, respectively:

>>> _join_paths('..')
'..'

>>> _join_paths('a', '..')
'.'

>>> _join_paths('a', '..', '..', '..')
'../..'

>>> _join_paths('', 'a', '..', '..')
''

Some more special cases:

>>> _join_paths('..', 'a')
'../a'

>>> _join_paths('', '.', '..', '..')
''

>>> _join_paths('.', '..')
'..'

>>> _join_paths('..', '.', '..')
'../..'

See also

_append_step

pandalone.mappings.pmods_from_tuples(pmods_tuples)[source]

Turns a list of 2-tuples into a pmods hierarchy.

  • Each tuple defines the renaming-or-relocation of the final part of some component path onto another one into value-trees, such as:

    (/rename/path, foo)          --> rename/foo
    (relocate/path, foo/bar)    --> relocate/foo/bar
    
  • The “from” path may be: - relative, - absolute(starting with /), or - “anywhere”(starting with //).

  • In case a “step” in the “from” path starts with tilda(), it is assumed to be a regular-expression, and it is removed from it. The “to” path can make use of any “from” capture-groups:

    ('/~all(.*)/path', 'foo')
    ('~some[\d+]/path', 'foo')
    ('//~all(.*)/path', 'foo')
    
Parameters:str) pmods_tuples (list(tuple(str,) –
Returns:a root pmod
Return type:Pmod

Example:

>>> pmods_from_tuples([
...     ('/a', 'A1/A2'),
...     ('/a/b', 'B'),
... ])
pmod({'': pmod({'a': pmod('A1/A2', {'b': pmod('B')})})})

>>> pmods_from_tuples([
...     ('/~a*', 'A1/A2'),
...     ('/a/~b[123]', 'B'),
... ])
pmod({'': pmod({'a':
        pmod(OrderedDict([(re.compile('b[123]'), pmod('B'))]))},
             OrderedDict([(re.compile('a*'), pmod('A1/A2'))]))})

This is how you map root:

>>> pmods = pmods_from_tuples([
...     ('', 'relative/Root'),        ## Make all paths relatives.
...     ('/a/b', '/Rooted/B'),        ## But map `b` would be "rooted".
... ])
>>> pmods
pmod({'':
        pmod('relative/Root',
                {'a': pmod({'b':
                        pmod('/Rooted/B')})})})

>>> pmods.map_path('/a/c')
'relative/Root/a/c'

>>> pmods.map_path('/a/b')
'/Rooted/B'

But note that ‘/’ maps the 1st “empty-str” step after root:

>>> pmods_from_tuples([
...     ('/', 'New/Root'),
... ])
pmod({'': pmod({'': pmod('New/Root')})})

TODO: Implement “anywhere” matches.

pandalone.mappings.pstep_from_df(columns_df, name_col='names')[source]

Creates a Pstep instances from a dataframe.

Parameters:columns_df (pd.DataFrame) –

pstep’s mapped-names in name_col column, indexed by paths, and any additional pstep-attributes in the rest columns.

example:

========  =========  ===================
paths     names      renames
========  =========  ===================
/A        foo        ['FOO', 'LL']
/B        bar        []
========  =========  ===================

5.3. Module: pandalone.components

Defines the building-blocks of a “model”:

components and assemblies:
See Component, FuncComponent and Assembly.
paths and path-mappings (pmods):
See Pmod, pmods_from_tuples() and Pstep.

5.3.1. TODO

  1. Assembly use ComponentLoader collecting components with:
    • gatattr() and
    • filter_predicate default to attr.__name__.startswith('cfunc_').
    • enforce a disable flag on them.
  2. Component/assembly should have a stackable or common cwd?
  3. Components should be easy to run without “framework”. - _build() –> run() - pmods on init OR run()? - As ContextManager?
  4. Imply a default Assembly.
class pandalone.components.Assembly(components, name=None)[source]

Bases: pandalone.components.Component

Example:

>>> def cfunc_f1(comp, value_tree):
...     comp.pinp().A
...     comp.pout().B
>>> def cfunc_f2(comp, value_tree):
...     comp.pinp().B
...     comp.pout().C
>>> ass = Assembly(FuncComponent(cfunc) for cfunc in [cfunc_f1, cfunc_f2])
>>> ass._build()
>>> assert list(ass._iter_validations()) == []
>>> ass._inp
['f1/A', 'f2/B']
>>> ass._out
['f1/B', 'f2/C']
>>> from pandalone.mappings import pmods_from_tuples
>>> pmod = pmods_from_tuples([
...     ('~.*',  '/root'),
... ])
>>> ass._build(pmod)
>>> sorted(ass._inp + ass._out)
['/root/A', '/root/B', '/root/B', '/root/C']
__call__(*args, **kws)[source]

Call self as a function.

__init__(components, name=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_build(pmod=None)[source]

Invoked once before run-time and should apply pmaps when given.

class pandalone.components.Component(name)[source]

Bases: object

Encapsulates a function and its its inputs/outputs dependencies.

It should be callable, and when executed it may read/modify the data-tree given as its 1st input.

An opportunity to fix the internal-state (i.e. inputs/output/name) is when the _build() is invoked.

Variables:
  • _name (list) – identifier
  • _inp (list) – list/of/paths required on the data-tree (must not overlap with out)
  • _out (list) – list/of/paths modified on the data-tree (must not overlap with inp)

Mostly defined through cfuncs, which provide for defining a component with a single function with a special signature, see FuncComponent.

__call__(*agrs, **kws)[source]

Call self as a function.

__init__(name)[source]

Initialize self. See help(type(self)) for accurate signature.

__metaclass__

alias of abc.ABCMeta

_build(pmod=None)[source]

Invoked once before run-time and should apply pmaps when given.

_iter_validations()[source]

Yields a msg for each failed validation rule.

Invoke it after _build() component.

class pandalone.components.FuncComponent(cfunc, name=None)[source]

Bases: pandalone.components.Component

Converts a “cfunc” into a component.

A cfunc is a function that modifies the values-tree with this signature:

cfunc_XXXX(comp, vtree)

where:

comp:
the FuncComponent associated with the cfunc
vtree:
the part of the data-tree involving the values to be modified by the cfunc

It works also as a utility to developers of a cfuncs, since it is passed as their 1st arg.

The cfuncs may use pinp() and pout() when accessing its input and output data-tree values respectively. Note that accessing any of those attributes from outside of cfunc, would result in an error.

If a cfunc access additional values with “fixed’ paths, then it has to manually add those paths into the _inp and _out lists.

Example:

This would be a fully “relocatable” cfunc:

>>> def cfunc_calc_foobar_rate(comp, value_tree):
...     pi = comp.pinp()
...     po = comp.pout()
...
...     df = value_tree.get(pi)
...
...     df[po.Acc] = df[pi.V] / df[pi.T]

To get the unmodified component-paths, use:

>>> comp = FuncComponent(cfunc_calc_foobar_rate)
>>> comp._build()
>>> assert list(comp._iter_validations()) == []
>>> sorted(comp._inp + comp._out)
['calc_foobar_rate/Acc', 'calc_foobar_rate/T', 'calc_foobar_rate/V']

To get the path-modified component-paths, use:

>>> from pandalone.mappings import pmods_from_tuples

>>> pmods = pmods_from_tuples([
...     ('~.*', '/A/B'),
... ])
>>> comp._build(pmods)

>>> sorted(comp.pinp()._paths())
['/A/B/T', '/A/B/V']

>>> comp.pout()._paths()
['/A/B/Acc']

>>> sorted(comp._inp + comp._out)
['/A/B/Acc', '/A/B/T', '/A/B/V']

>>> comp._build(pmods)
>>> sorted(comp._inp + comp._out)
['/A/B/Acc', '/A/B/T', '/A/B/V']
__call__(*args, **kws)[source]

Call self as a function.

__init__(cfunc, name=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_build(pmod=None)[source]

Extracts inputs/outputs from cfunc.

pinp(path=None)[source]

The suggested Pstep for cfunc to use to access inputs.

pout(path=None)[source]

The suggested Pstep for cfunc to use to access outputs.

5.4. Module: pandalone.pandata

A pandas-model is a tree of strings, numbers, sequences, dicts, pandas instances and resolvable URI-references, implemented by Pandel.

class pandalone.pandata.JSONCodec[source]

Bases: object

Json coders/decoders capable for (almost) all python objects, by pickling them.

Example:

>>> import json
>>> obj_list = [
...    3.14,
...    {
...         'aa': pd.DataFrame([]),
...         2: np.array([]),
...         33: {'foo': 'bar'},
...     },
...     pd.DataFrame(np.random.randn(10, 2)),
...     ('b', pd.Series({})),
... ]
>>> for o in obj_list + [obj_list]:
...     s = json.dumps(o, cls=JSONCodec.Encoder)
...     oo = json.loads(s, cls=JSONCodec.Decoder)
...     assert trees_equal(o, oo)
...
class Decoder(object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)[source]

Bases: json.decoder.JSONDecoder

decode(s)[source]

Return the Python representation of s (a str instance containing a JSON document).

class Encoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: json.encoder.JSONEncoder

encode(o)[source]

Return a JSON string representation of a Python data structure.

>>> from json.encoder import JSONEncoder
>>> JSONEncoder().encode({"foo": ["bar", "baz"]})
'{"foo": ["bar", "baz"]}'
class pandalone.pandata.JSchema[source]

Bases: object

Facilitates the construction of json-schema-v4 nodes on PStep code.

It does just rudimentary args-name check. Further validations should apply using a proper json-schema validator.

Parameters:
class pandalone.pandata.ModelOperations[source]

Bases: pandalone.pandata.ModelOperations

Customization functions for traversing, I/O, and converting self-or-descendant branch (sub)model values.

static __new__(cls, inp=None, out=None, conv=None)[source]
Parameters:
  • inp (list) – the args-list to Pandel._read_branch()
  • out

    The args to Pandel._write_branch(), that may be specified either as:

    • an args-list, that will apply for all model data-types (lists, dicts & pandas),
    • a map of type –> args-list, where the None key is the catch-all case,
    • a function returning the args-list for some branch-value, with signature: def get_write_branch_args(branch).
  • conv

    The conversion-functions (convertors) for the various model’s data-types. The convertors have signature def convert(branch), and they may be specified either as:

    • a map of (from_type, to_type) –> conversion_func(), where the None key is the catch-all case,
    • a “master-switch” function returning the appropriate convertor depending on the requested conversion. The master-function’s signature is def get_convertor(from_branch, to_branch).

    The minimum convertors demanded by Pandel are (at least, check the code for more):

    • DataFrame <–> dict
    • Series <–> dict
    • ndarray <–> list
class pandalone.pandata.Pandel(curate_funcs=())[source]

Bases: object

Builds, validates and stores a pandas-model, a mergeable stack of JSON-schema abiding trees of strings and numbers, assembled with

Overview

The making of a model involves, among others, schema-validating, reading subtree-branches from URIs, cloning, converting and merging multiple sub-models in a single unified-model tree, without side-effecting given input. All these happen in 4+1 steps:

              ....................... Model Construction .................
 ------------ :  _______    ___________                                  :
/ top_model /==>|Resolve|->|PreValidate|-+                               :
-----------'  : |___0___|  |_____1_____| |                               :
 ------------ :  _______    ___________  |   _____    ________    ______ :   --------
/ base-model/==>|Resolve|->|PreValidate|-+->|Merge|->|Validate|->|Curate|==>/ model /
-----------'  : |___0___|  |_____1_____|    |_ 2__|  |___3____|  |__4+__|:  -------'
              ............................................................

All steps are executed “lazily” using generators (with yield). Before proceeding to the next step, the previous one must have completed successfully. That way, any ad-hoc code in building-step-5(curation), for instance, will not suffer a horrible death due to badly-formed data.

[TODO] The storing of a model simply involves distributing model parts into different files and/or formats, again without side-effecting the unified-model.

Building model

Here is a detailed description of each building-step:

  1. _resolve() and substitute any json-references present in the submodels with content-fragments fetched from the referred URIs. The submodels are cloned first, to avoid side-effecting them.

    Although by default a combination of JSON and CSV files is expected, this can be customized, either by the content in the json-ref, within the model (see below), or as explained below.

    The extended json-refs syntax supported provides for passing arguments into _read_branch() and _write_branch() methods. The syntax is easier to explain by showing what the default _global_cntxt corresponds to, for a DataFrame:

    {
      "$ref": "http://example.com/example.json#/foo/bar",
      "$inp": ["AUTO"],
      "$out": ["CSV", "encoding=UTF-8"]
    }
    

    And here what is required to read and (later) store into a HDF5 local file with a predefined name:

    {
      "$ref": "file://./filename.hdf5",
      "$inp": ["AUTO"],
      "$out": ["HDF5"]
    }
    

    Warning

    Step NOT IMPLEMENTED YET!

  2. Loosely _prevalidate() each sub-model separately with json-schema, where any pandas-instances (DataFrames and Series) are left as is. It is the duty of the developer to ensure that the prevalidation-schema is loose enough that it allows for various submodel-forms, prior to merging, to pass.

  3. Recursively clone and _merge() sub-models in a single unified-model tree. Branches from sub-models higher in the stack override the respective ones from the sub-models below, recursively. Different object types need to be converted appropriately (ie. merging a dict with a DataFrame results into a DataFrame, so the dictionary has to convert to dataframe).

    The required conversions into pandas classes can be customized as explained below. Series and DataFrames cannot merge together, and Sequences do not merge with any other object-type (themselfs included), they just “overwrite”.

    The default convertor-functions defined both for submodels and models are listed in the following table:

    From: To: Method:
    dict DataFrame pd.DataFrame (the constructor)
    DataFrame dict lambda df: df.to_dict('list')
    dict Series pd.Series (the constructor)
    Series dict lambda sr: sr.to_dict()
  4. Strictly json-_validate() the unified-model (ie enforcing required schema-rules).

    The required conversions from pandas classes can be customized as explained below.

    The default convertor-functions are the same as above.

  5. (Optionally) Apply the _curate() functions on the the model to enforce dependencies and/or any ad-hoc generation-rules among the data. You can think of bash-like expansion patterns, like ${/some/path:=$HOME} or expressions like %len(../other/path).

Storing model

When storing model-parts, if unspecified, the filenames to write into will be deduced from the jsonpointer-path of the $out’s parent, by substituting “strange” chars with undescores(_).

Warning

Functionality NOT IMPLEMENTED YET!

Customization

Some operations within steps (namely conversion and IO) can be customized by the following means (from lower to higher precedance):

  1. The global-default ModelOperations instance on the _global_cntxt, applied on both submodels and unified-model.

    For example to channel the whole reading/writing of models through HDF5 data-format, it would suffice to modify the _global_cntxt like that:

    pm = FooPandelModel()                        ## some concrete model-maker
    io_args = ["HDF5"]
    pm.mod_global_operations(inp=io_args, out=io_args)
    
  2. [TODO] Extra-properties on the json-schema applied on both submodels and unified-model for the specific path defined. The supported properties are the non-functional properties of ModelOperations.

  1. Specific-properties regarding IO operations within each submodel - see the resolve building-step, above.
  1. Context-maps of json_paths –> ModelOperations instances, installed by add_submodel() and unified_contexts on the model-maker. They apply to self-or-descedant subtree of each model.

    The json_path is a strings obeying a simplified json-pointer syntax (no char-normalizations yet), ie /some/foo/1/pointer. An empty-string('') matches all model.

    When multiple convertors match for a model-value, the selected convertor to be used is the most specific one (the one with longest prefix). For instance, on the model:

    [ { "foo": { "bar": 0 } } ]
    

    all of the following would match the 0 value:

    but only the last’s context-props will be applied.

Atributes

model

The model-tree that will receive the merged submodels after build() has been invoked. Depending on the submodels, the top-value can be any of the supported model data-types.

_submodel_tuples

The stack of (submodel, path_ops) tuples. The list’s 1st element is the base-model, the last one, the top-model. Use the add_submodel() to build this list.

_global_cntxt

A ModelOperations instance acting as the global-default context for the unified-model and all submodels. Use mod_global_operations() to modify it.

_curate_funcs

The sequence of curate functions to be executed as the final step by _curate(). They are “normal” functions (not generators) with signature:

def curate_func(model_maker):
    pass      ## ie: modify ``model_maker.model``.

Better specify this list of functions on construction time.

_errored

An internal boolean flag that becomes True if any build-step has failed, to halt proceeding to the next one. It is None if build has not started yet.

Examples

The basic usage requires to subclass your own model-maker, just so that a json-schema is provided for both validation-steps, 2 & 4:

>>> from collections import OrderedDict as od                           ## Json is better with stable keys-order
>>> class MyModel(Pandel):
...     def _get_json_schema(self, is_prevalidation):
...         return {                                                    ## Define the json-schema.
...             '$schema': 'http://json-schema.org/draft-04/schema#',
...             'required': [] if is_prevalidation else ['a', 'b'],     ## Prevalidation is more loose.
...             'properties': {
...                 'a': {'type': 'string'},
...                 'b': {'type': 'number'},
...                 'c': {'type': 'number'},
...             }
...         }

Then you can instanciate it and add your submodels:

>>> mm = MyModel()
>>> mm.add_submodel(od(a='foo', b=1))                                   ## submodel-1 (base)
>>> mm.add_submodel(pd.Series(od(a='bar', c=2)))                        ## submodel-2 (top-model)

You then have to build the final unified-model (any validation errors would be reported at this point):

>>> mdl = mm.build()

Note that you can also access the unified-model in the model attribute. You can now interogate it:

>>> mdl['a'] == 'bar'                       ## Value overridden by top-model
True
>>> mdl['b'] == 1                           ## Value left intact from base-model
True
>>> mdl['c'] == 2                           ## New value from top-model
True

Lets try to build with invalid submodels:

>>> mm = MyModel()
>>> mm.add_submodel({'a': 1})               ## According to the schema, this should have been a string,
>>> mm.add_submodel({'b': 'string'})        ## and this one, a number.
>>> sorted(mm.build_iter(), key=lambda ex: ex.message)    ## Fetch a list with all validation errors. 
[<ValidationError: "'string' is not of type 'number'">,
 <ValidationError: "1 is not of type 'string'">,
 <ValidationError: 'Gave-up building model after step 1.prevalidate (out of 4).'>]
>>> mdl = mm.model
>>> mdl is None                                     ## No model constructed, failed before merging.
True

And lets try to build with valid submodels but invalid merged-one:

>>> mm = MyModel()
>>> mm.add_submodel({'a': 'a str'})
>>> mm.add_submodel({'c': 1})
>>> sorted(mm.build_iter(), key=lambda ex: ex.message)  
[<ValidationError: "'b' is a required property">,
 <ValidationError: 'Gave-up building model after step 3.validate (out of 4).'>]
__init__(curate_funcs=())[source]
Parameters:curate_funcs (sequence) – See _curate_funcs.
__metaclass__

alias of abc.ABCMeta

_clone_and_merge_submodels(a, b, path='')[source]

‘ Recursively merge b into a, cloning both.

_curate()[source]

Step-4: Invokes any curate-functions found in _curate_funcs.

_get_json_schema(is_prevalidation)[source]
Returns:a json schema, more loose when prevalidation for each case
Return type:dictionary
_merge()[source]

Step-2

_prevalidate()[source]

Step-1

_read_branch()[source]

Reads model-branches during resolve step.

_resolve()[source]

Step-1

_select_context(path, branch)[source]

Finds which context to use while visiting model-nodes, by enforcing the precedance-rules described in the Customizations.

Parameters:
  • path (str) – the branch’s jsonpointer-path
  • branch (str) – the actual branch’s node
Returns:

the selected ModelOperations

_validate()[source]

Step-3

_write_branch()[source]

Writes model-branches during distribute step.

add_submodel(model, path_ops=None)[source]

Pushes on top a submodel, along with its context-map.

Parameters:
  • model – the model-tree (sequence, mapping, pandas-types)
  • path_ops (dict) – A map of json_paths –> ModelOperations instances acting on the unified-model. The path_ops may often be empty.

Examples

To change the default DataFrame –> dictionary convertor for a submodel, use the following:

>>> mdl = {'foo': 'bar'}
>>> submdl = ModelOperations(mdl, conv={(pd.DataFrame, dict): lambda df: df.to_dict('record')})
build()[source]

Attempts to build the model by exhausting build_iter(), or raises its 1st error.

Use this method when you do not want to waste time getting the full list of errors.

build_iter()[source]

Iteratively build model, yielding any problems as ValidationError instances.

For debugging, the unified model at model my contain intermediate results at any time, even if construction has failed. Check the _errored flag if neccessary.

mod_global_operations(operations=None, **cntxt_kwargs)[source]

Since it is the fall-back operation for conversions and IO operation, it must exist and have all its props well-defined for the class to work correctly.

Parameters:
  • operations (ModelOperations) – Replaces values of the installed context with non-empty values from this one.
  • cntxt_kwargs – Replaces the keyworded-values on the existing operations. See ModelOperations for supported keywords.
unified_contexts

A map of json_paths –> ModelOperations instances acting on the unified-model.

pandalone.pandata._NONE = <object object>

Denotes non-existent json-schema attribute in JSchema.

pandalone.pandata._U

alias of pandalone.pandata.United

pandalone.pandata._units_cleaner_regex = re.compile('^[[<]|[\\]>]$')

An item-descriptor with units, i.e. used as a table-column header.

pandalone.pandata.iter_jsonpointer_parts(jsonpath)[source]

Generates the jsonpath parts according to jsonpointer spec.

Parameters:jsonpath (str) – a jsonpath to resolve within document
Returns:The parts of the path as generator), without converting any step to int, and None if None.
Author:Julian Berman, ankostis

Examples:

>>> list(iter_jsonpointer_parts('/a/b'))
['a', 'b']

>>> list(iter_jsonpointer_parts('/a//b'))
['a', '', 'b']

>>> list(iter_jsonpointer_parts('/'))
['']

>>> list(iter_jsonpointer_parts(''))
[]

But paths are strings begining (NOT_MPL: but not ending) with slash(‘/’):

>>> list(iter_jsonpointer_parts(None))
Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'split'

>>> list(iter_jsonpointer_parts('a'))
Traceback (most recent call last):
jsonschema.exceptions.RefResolutionError: Jsonpointer-path(a) must start with '/'!

#>>> list(iter_jsonpointer_parts('/a/'))
#Traceback (most recent call last):
#jsonschema.exceptions.RefResolutionError: Jsonpointer-path(a) must NOT ends with '/'!
pandalone.pandata.iter_jsonpointer_parts_relaxed(jsonpointer)[source]

Like iter_jsonpointer_parts() but accepting also non-absolute paths.

The 1st step of absolute-paths is always ‘’.

Examples:

>>> list(iter_jsonpointer_parts_relaxed('a'))
['a']
>>> list(iter_jsonpointer_parts_relaxed('a/'))
['a', '']
>>> list(iter_jsonpointer_parts_relaxed('a/b'))
['a', 'b']

>>> list(iter_jsonpointer_parts_relaxed('/a'))
['', 'a']
>>> list(iter_jsonpointer_parts_relaxed('/a/'))
['', 'a', '']

>>> list(iter_jsonpointer_parts_relaxed('/'))
['', '']

>>> list(iter_jsonpointer_parts_relaxed(''))
['']
pandalone.pandata.parse_value_with_units(arg)[source]

Parses name-units pairs (i.e. used as a table-column header).

Returns:a United(name, units) named-tuple, or None if bad syntax; note that name='' but units=None when missing.

Examples:

>>> parse_value_with_units('value [units]')
United(name='value', units='units')

>>> parse_value_with_units('foo   bar  <bar/krow>')
United(name='foo   bar', units='bar/krow')

>>> parse_value_with_units('no units')
United(name='no units', units=None)

>>> parse_value_with_units('')
United(name='', units=None)

But notice:

>>> assert parse_value_with_units('ok but [bad units') is None

>>> parse_value_with_units('<only units>')
United(name='', units='only units')

>>> parse_value_with_units(None)  
Traceback (most recent call last):
TypeError: expected string or ...
pandalone.pandata.resolve_jsonpointer(doc, jsonpointer, default=<object object>)[source]

Resolve a jsonpointer within the referenced doc.

Parameters:
  • doc – the referrant document
  • path (str) – a jsonpointer to resolve within document
  • default – A value to return if path does not resolve.
Returns:

the resolved doc-item or raises RefResolutionError

Raises:

RefResolutionError (if cannot resolve path and no default)

Examples:

>>> dt = {
...     'pi':3.14,
...     'foo':'bar',
...     'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')),
...     'sub': {
...         'sr': pd.Series({'abc':'def'}),
...     }
... }
>>> resolve_jsonpointer(dt, '/pi', default=_scream)
3.14
>>> resolve_jsonpointer(dt, '/pi/BAD')
Traceback (most recent call last):
jsonschema.exceptions.RefResolutionError: Unresolvable JSON pointer('/pi/BAD')@(BAD)
>>> resolve_jsonpointer(dt, '/pi/BAD', 'Hi!')
'Hi!'
Author:Julian Berman, ankostis
pandalone.pandata.resolve_path(doc, path, default=<object object>, root=None)[source]

Like resolve_jsonpointer() also for relative-paths & attribute-branches.

Parameters:
  • doc – the referrant document
  • path (str) – An abdolute or relative path to resolve within document.
  • default – A value to return if path does not resolve.
  • root – Document for absolute paths, assumed doc if missing.
Returns:

the resolved doc-item or raises RefResolutionError

Raises:

RefResolutionError (if cannot resolve path and no default)

Examples:

>>> dt = {
...     'pi':3.14,
...     'foo':'bar',
...     'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')),
...     'sub': {
...         'sr': pd.Series({'abc':'def'}),
...     }
... }
>>> resolve_path(dt, '/pi', default=_scream)
3.14
>>> resolve_path(dt, 'df/V')
0    1.0
1    1.0
2    1.0
Name: V, dtype: float64
>>> resolve_path(dt, '/pi/BAD', 'Hi!')
'Hi!'
Author:Julian Berman, ankostis
pandalone.pandata.set_jsonpointer(doc, jsonpointer, value, object_factory=<class 'collections.OrderedDict'>)[source]

Resolve a jsonpointer within the referenced doc.

Parameters:
  • doc – the referrant document
  • jsonpointer (str) – a jsonpointer to the node to modify
Raises:

RefResolutionError (if jsonpointer empty, missing, invalid-contet)