4. API reference¶
These are the modules in maturity order:
xleash |
A mini-language for “throwing the rope” around rectangular areas of Excel-sheets. |
mappings |
Hierarchical string-like objects useful for indexing, that can be rename/relocated at a later stage. |
pandata |
A pandas-model is a tree of strings, numbers, sequences, dicts, pandas instances and resolvable URI-references, implemented by Pandel. |
components |
Defines the building-blocks of a “model”: |
4.1. Module: pandalone.xleash¶
A mini-language for “throwing the rope” around rectangular areas of Excel-sheets.
4.1.1. About¶
Any decent dataset is stored in csv. Consequently, many datasets are still trapped in excel-sheets.
XLeash defines a url-fragment notation (xl-ref) that renders the capturing of tables from sheets as practical as reading a csv, even when the exact position of those tables are not known beforehand.
An additional goal is to apply the same lassoing operation recursively, to build data-trees. For that end, the syntax supports filter transformations such as:
- setting the dimensionality of the result tables,
- creating higher-level objects from 2D capture-rect (dictionaries, numpy-arrays & dataframes).
It is based on xlrd library but also checked for compatibility with xlwings COM-client library. It requires numpy and (optionally) pandas. Since May 2020 it is python-3.6+ only.
4.1.2. Overview¶
The xl-ref notation extends ordinary A1 and RC excel coordinates with
conditional traversing operations, based on the cell’s empty/full state.
For instance, to extract a contigious table near the A1 cell,
and make a pandas.DataFrame out of it use this:
from pandalone import xleash, SheetsFactory
shfac = SheetsFactory()
shfac.list_sheetnames(''path/to/workbook.xlsx')
[Sheet1', ...]
## Search and capture the first contiguous table from the 1st sheet
# as a pandas-DataFrame:
df = xleash.lasso('path/to/workbook.xlsx#0!A1(DR):..(DR):RLDU:["df"]',
sheets_factory=shfac)
## Assuming the sheet contain a single table, a lone `:` fetches
# the same contents. Additionally, it is possible
# to skip the sheetname/sheet-index (1st 1st sheet implied).
df = xleash.lasso('#:["df"]',
url_file=path/to/workbook.xlsx,
sheets_factory=shfac)
4.1.2.1. Xl-ref Syntax¶
[<url>]#[<sheet>!][<1st-edge>][:[<2nd-edge>][:<expansions>]][:<filters>]
- See edge, expansion-moves, filters for details.
- Missing edges are implicitly replaced by
^^:__(top-left/bottom-right). - Spaces are allowed only in filters.
4.1.2.2. Annotated Example¶
target-moves─────┐
landing-cell──┐ │
┌┤ ┌┤
#C3(UL):..(RD):RULD:["pipe": ["dict", "recursive"]]
└─┬──┘ └─┬──┘ └┬─┘ └──────────────┬──────────────┘
1st-edge───────┘ │ │ │
2nd-edge──────────────┘ │ │
expansions──────────────────┘ │
filters────────────────────────────────────────┘
Which means:
- Target the 1st edge of the capture-rect by starting from
C3landing-cell. If it is a full-cell, stop, otherwise start moving above and to the left ofC3and stop on the first full-cell;- continue from the last target and travel the exterior row and column right and down, stopping on their last full-cell;
- capture all the cells between the 2 targets.
- try expansions to all directions if any neighbouring full-cell;
- finally filter the values of the capture-rect to wrap them up in an ordered- dictionary, and dive into its values searching for xl-ref, and replace them.
4.1.2.3. Basic Usage¶
The simplest way to lasso a xl-ref is through lasso().
A common task is to capture all non-empty cells of the 1st workbook-sheet but
without any bordering nulls:
>>> from pandalone import xleash
>>> values = xleash.lasso('path/to/workbook.xlsx#:')
Assuming that the full-cell of the 1st sheet of the workbook on disk are
those marked with 'X', then the result capture-rect of the above call
would be a 2D list-of-lists with the values contained in C2:E4:
A B C D E
1 ┌─────┐
2 │ X│
3 │X │
4 │ X │
5 └─────┘
If another sheet is desired, add its name or 0-based ordinal immediately after #
separated by a ! with the rest of the xl-ref - which inthat case
might be empty:
>>> lasso = xleash.lasso
>>> lasso('Book.xlsx#Sheet1!') == lasso('Book.xlsx#0!') == lasso('Book.xlsx#:')
True
If you do not wish to let the library read your workbooks, you can
invoke the function with a pre-loaded sheet.
Here we will use the utility ArraySheet with a more complicated
xl-ref expression:
>>> sheet = xleash.ArraySheet([[None, None, 'A', None],
... [None, 2.2, 'foo', None],
... [None, None, 2, None],
... [None, None, None, 3.14],
... ])
>>> xleash.lasso('#A1(DR):..(DR):RULD', sheet=sheet)
[[None, 'A'],
[2.2, 'foo'],
[None, 2]]
This capture-rect in this case was B1 and C3 as can be seen by inspecting
the st and nd fields of the full Xlref results returned:
>>> xleash.lasso('#A1(DR):..(DR):RULD', sheet=sheet, return_lasso=True)
Lasso(xl_ref='#A1(DR):..(DR):RULD',
url_file=None,
sh_name=None,
st_edge=Edge(land=Cell(row='1', col='A'), mov='DR', mod=None),
nd_edge=Edge(land=Cell(row='.', col='.'), mov='DR', mod=None),
exp_moves='RULD',
call_spec=None,
sheet=ArraySheet(SheetId(book='wb', ids=['sh', 0]),
[[None None 'A' None]
[None 2.2 'foo' None]
[None None 2 None]
[None None None 3.14]]),
st=Coords(row=0, col=1),
nd=Coords(row=2, col=2),
values=[[None, 'A'],
[2.2, 'foo'],
[None, 2]],
base_coords=None,
...
For controlling explicitly the configuration parameters and the opening of
workbooks, use separate instances of Ranger and SheetsFactory,
that are the workhorses of this library:
>>> with xleash.SheetsFactory() as sf:
... sf.add_sheet(sheet, wb_ids='foo_wb', sh_ids='Sheet1')
... ranger = xleash.Ranger(sf, base_opts={'verbose': True})
... ranger.do_lasso('foo_wb#Sheet1!__').values
3.14
Notice that it returned a scalar value since we specified only the 1st edge
as '__', which points to the bottom row and most-left column of the sheet.
Alternatively you can call the make_default_Ranger() for extending
library’s defaults.
4.1.2.4. More Syntax Examples¶
Another typical but more advanced case is when a sheet contains a single table with a “header”-row and a “index”-column. There are (at least) 3 ways to do it, beyond specifying the exact coordinates:
A B C D E
1 ┌───────┐ Β2:E4 ## Exact referencing.
2 │ X X X│ ^^.__ or : ## From top-left full-cell to bottom-right.
3 │X X X X│ A1(DR):__:U1 ## Start from A1 and move down and right
3 │X X X X│ # until B3; capture till bottom-left;
4 │X X X X│ # expand once upwards (to header row).
└───────┘ A1(RD):__:L1 ## Start from A1 and move down by row
# until C1; capture till bottom-left;
# expand once left (to index column).
Note that if B1 were full, the results would still be the same, because
? expands only if any full-cell found in row/column.
In case where the sheet contains more than one disjoint tables, the bottom-left cell of the sheet would not coincide with table-end, so the handy last two xl-ref above would not work.
For that we may resort to dependent referencing for the 2nd edge, and define its position in relation to the 1st target:
A B C D E
1 ┌─────┐ _^:..(LD+):L1 ## Start from top-right(E2) and target left
2 │ X X│ # left(D2); from there capture left-down
3 │X X X│ # till 1st empty-cell(C4, regardless of
4 │X X X│ # col/row order); expand left once.
└─────┘ ^_(U):..(UR):U1 ## Start from B5 and target 1st cell up;
5 Χ # capture from there till D3; expand up.
In the presence of empty-cell breaking the exterior row/column of the 1st landing-cell, the capturing becomes more intricate:
A B C D E
1 ┌─────┐ Β2:D_
2 │ X X│ A1(RD):..(RD):L1D
3 │X X │ D_:^^
3 │X │ A^(DR):D_:U
4 │ X │X
└─────┘
A B C D E
1 ┌───┐ ^^(RD):..(RD)
2 │X X│ _^(R):^.(DR)
3 X│X │
└───┘
3 X
4 X X
A B C D E
1 ┌───┐ Β2:C4
2 │ X│X A1(RD):^_
3 │X X│ C_:^^
3 │X │ A^(DR):C_:U
4 │ X│ X ^^(RD):..(D):D
└───┘ D2(L+):^_
See also
Example spreadsheet: xleash.xlsx
4.1.3. Definitions¶
- lasso
- lassoing
It may denote 3 things:
- the whole procedure of parsing the xl-ref syntax, capturing values from spreadsheet rect-regions and sending them through any filters specified in the xl-ref;
- the
lasso()and/orRanger.do_lasso()functions performing the above job; - the
Lassostoring intermediate and final results of the above algorithm.
- xl-ref
Any url with its fragment abiding to the syntax defined herein.
- The fragment describes how to capture rects from excel-sheets, and it is composed of 2 edge references followed by expansions and filters.
- The file-part should resolve to an excel-file.
- parse
- parsing
- The stage where the input string gets splitted and checked for validity against the xl-ref syntax.
- edge
An edge might signify:
- the syntactic construct of the xl-ref, composed of a pair
of row/column coordinates, optionally followed by parenthesized
target-moves, like
A1(LU); - the bounding cells of the target-rect;
- the bounding cells of the capture-rect.
- the syntactic construct of the xl-ref, composed of a pair
of row/column coordinates, optionally followed by parenthesized
target-moves, like
- 1st
- 2nd
It may refer to the 1st/2nd:
- edge of some xl-ref;
- landing-cell of an edge;
- target-cell of an edge;
- capture-cell of a capture-rect.
The 1st-edge` supports `absolute` `coordinates` only, while the *2nd-edge supports also dependent ones from the 1st target-cell.
- landing-cell
- The cell identified by the coordinates of the edge alone.
- target-cell
- target-rect
- The bounding cell identified after applying target-moves on the landing-cell.
- target
- targeting
The process of identifying any target-cell bounding the target-rect.
- The search for the target-cell starts from the landing-cell, follows the specified target-moves, and ends when a state-change is detected on an exterior column or row, according to the enacted termination-rule.
- Failure to identify any target-cell raises a
EmptyCaptureExceptionwhich is subsequently translated as empty capture-rect byRangerwhen opts contain{"no_empty": false}(default). - The process is followed by expansions to identify the capture-rect.
Note that in the case of a dependent 2nd edge, the target-rect would always be the same, irrespective of whether target-moves denoted a row-by-row or column-by-column traversal.
- capture
- capturing
It is the overall procedure of:
- targeting both edge refs to come up with the target-rect;
- performing expansions to identify the capture-rect;
- extracting the values and feed them to filters.
- capture-rect
- capture-cell
- The rectangular-area of the sheet denoted by the two capture-cells identified by capturing, that is, after applying expansions on target-rect.
- directions
- The 4 primitive directions that are denoted with one of the letters
LURD. Thee are used to express both target-moves and expansions. - coordinate
- coordinates
- Any pair of a cell/column coordinates specifying cell positions,
(i.e. landing-cell, target-cell, bounds of the capture-rect)
written as the first part of the edge syntax, or implicitely resolved.
They can be expressed in
A1orRCformat or as a zero-based(row, col)tuple (num). Each coordinate might be absolute or dependent, independently. - traversing
- traversal-operations
- Either the target-moves or the expansion-moves that comprise the capturing.
- target-moves
Specify the cell traversing order while targeting using primitive directions pairs. The pairs
UDandLR(and their inverse) are invalid. I.e.DRmeans:“Start going right, column-by-column, traversing each column from top to bottom.”- move-modifier
- One of
+and-chars that might trail the target-moves and define which the termination-rule to follow if landing-cell is full-cell, i.e.A1(RD+) - expansions
- expansion-moves
Due to state-change on the ‘exterior’ cells the capture-rect might be smaller that a wider contigious but “convex” rectangular area.
The expansions attempt to remedy this by providing for expanding on arbitrary directions accompanied by a multiplicity for each one. If multiplicity is unspecified, infinite assumed, so it expands until an empty/full row/column is met.
- absolute
Any cell row/col identified with column-characters, row-numbers, or the following special-characters:
^The top/Left full-cell coordinate._The bottom/right full-cell coordinate.
- dependent
- base-cell
A landing-cell whose any coordinate is identified with a dot(
.), which resolves to the base-coordinate depending on which edge it is referring to:- 1st edge: The coordinates of the base-cell field of the
Lasso given to the
Ranger.do_lasso(); must not beNone. - 2nd edge: the target-cell coordinates of the 1st edge.
An edge might contain a “mix” of absolute and dependent coordinates.
- 1st edge: The coordinates of the base-cell field of the
Lasso given to the
- state
- full-cell
- empty-cell
- full-cell
- A cell is full when it is not empty / blank (in Excel’s parlance).
- states-matrix
- A boolean matrix denoting the state of the cells, having the same size as a sheet it was derived from.
- state-change
- Whether we are traversing from an empty-cell to a full-cell, and vice-versa, while targeting.
- termination-rule
The condition to stop targeting while traversing from landing-cell. The are 2 rules: search-same and search-opposite.
See also
Check Target-termination enactment for the enactment of the rules.
- search-opposite
- The target-cell is the FIRST full-cell found while traveling from the landing-cell according to the target-moves.
- search-same
- The coordinates of the target-cell are given by the LAST full-cell on the exterior column/row according to the target-moves; the order of the moves is insignificant in that case.
- exterior
- The column and the row of the landing-cell; the search-same termination-rule gets to be triggered by ‘full-cells’ only on them.
- filter
- filters
- The last part of the xl-ref specifying predefined functions to apply for transforming the cell-values of capture-rect, abiding to the json syntax. They may be bulk or element-wise.
- bulk
- bulk-filter
- A filter treating capture-rect values as a whole , i.e. transposing arrays, is_empty
- element-wise
- element-wise-filter
- A filter diving into capture-rect values, i.e. for python-eval.
- call-specifier
- call-spec
The structure to specify some function call in the filter part; it can either be a json string, list or object like that:
- string:
"func_name" - list:
["func_name", ["arg1", "arg2"], {"k1": "v1"}]where the last 2 parts are optional and can be given in any order; - object:
{"func": "func_name", "args": ["arg1"], "kwds": {"k":"v"}}where theargsandkwdsare optional.
If the outer-most filter is a dictionary, a
'pop'kwd is popped-out as the opts.- string:
- opts
- Key-value pairs affecting the lassoing (i.e. opening xlrd-workbooks).
Read the code to be sure what are the available choices :-(
They are a combination of options specified in code (i.e. in the
lasso()and those extracted from filters by the ‘opts’ key, and they are stored in theLasso. - backend
- backends
IO level object providing the actual spreadsheet cells for capturing. Each backend may provide for its workbooks and sheets corresponding to: - different implementations (e.g.``xlrd`` or
xlwingslibrary), or - different origins (e.g. file-based, network-based per url ).The decision which backend to use is taken by the sheets-factory following a bidding process.
- sheets-factory
- IO level object acting as the caching manager for spreadsheets fetched from different backends. The caching happens per spreadsheet.
- bid
- bidding
- backend-bidding
- bidding
- All backends are asked to provide their willingness to handle
some xl-ref (see
SimpleSheetFactory.decide_backend())). For a sibling sheet, always the parent backend is used. - sheet
- spreadsheet
- IO level object that acts as the container of cells.
4.1.4. Details¶
4.1.4.1. Target-moves¶
There are 12 target-moves named with a single or a pair of
letters denoting the 4 primitive directions, LURD:
U
UL◄───┐▲┌───►UR
LU │││ RU
▲ │││ ▲
│ │││ │
└─────┼│┼─────┘
L◄──────X──────►R
┌─────┼│┼─────┐
│ │││ │
▼ │││ ▼
LD │││ RD
DL◄───┘▼└───►DR
D
- The 'X' at the center points the starting cell.
So a RD move means “traverse cells first by rows then by columns”,
or more lengthy description would be:
“Start moving *right till 1st state change, and then move down to the next row, and start traversing right again.”*
4.1.4.2. Target-cells¶
Using these moves we can identify a target-cell in relation to
the landing-cell. For instance, given this xl-sheet below, there are
multiple ways to identify (or target) the non-empty values X, below:
A B C D E F
1
2
3 X ──────► C3 A1(RD) _^(L) F3(L)
4 X ──────► E4 A4(R) _4(L) D1(DR)
5 X ──────► B5 A1(DR) A_(UR) _5(L)
6 X ──────► F6 __ _^(D) A_(R)
- The 'X' signifies non-empty cells.
So we can target cells with “absolute coordinates”, the usual A1 notation,
augmented with the following special characters:
- undesrcore(
_) for bottom/right, and- accent(
^) for top/left
columns/rows of the sheet with non-empty values.
When no LURD moves are specified, the target-cell coinceds with the starting one.
See also
Target-termination enactment section
4.1.4.3. Capturing¶
To specify a complete capture-rect we need to identify a 2nd cell. The 2nd target-cell may be specified:
In the above example-sheet, here are some ways to specify refs:
A B C D E F
1
2
┌─────┐
┌──┼─┐ │
3 │ │X│ │
│┌─┼─┼───┼┐
4 ││ │ │ X││
││ └─┼───┴┼───► C3:E4 A1(RD):..(RD) _^(L):..(DR) _4(L):A1(RD)
5 ││X │ │
│└───┼────┴───► B4:E5 A_(UR):..(RU) _5(L):1_(UR) E1(D):A.(DR)
6 │ │ X
└────┴────────► Β3:C6 A1(RD):^_ ^^:C_ C_:^^
Warning
Of course, the above rects WILL FAIL since the target-moves
will stop immediately due to X values being surrounded by empty-cells.
But the above diagram was to just convey the general idea. To make it work, all the in-between cells of the peripheral row and columns should have been also non-empty.
Note
The capturing moves from 1st target-cell to 2nd target-cell are independent from the implied target-moves in the case of dependent coords.
More specifically, the capturing will always fetch the same values
regardless of “row-first” or “column-first” order; this is not the case
with targeting (LURD) moves.
For instance, to capture B4:E5 in the above sheet we may use
_5(L):E.(U).
In that case the target cells are B5 and E4 and the target-moves
to reach the 2nd one are UR which are different from the U
specified on the 2nd cell.
4.1.4.4. Target-termination enactment¶
The guiding principle for when to enact each rule is to always capture a matrix of full-cell.
- If the landing-cell is empty-cell, always search-opposite, that is, stop on the first full-cell.
- When the landing-cell is full-cell, it depends on the ‘move-modifier’:
- If
+exists, apply search-same. - If
-exists, stop on landing-cell. - If no modifier, behave like
`-` (stop on `landing-cell`) except when on a `2nd` edge with both its coordinates `dependent` (..``), where the search-same is applied
- If
So, both move-modifier apply only when landing-cell is full-cell
, and - actually makes sense only when 2nd edge is dependent.
If the termination conditions is not met, an EmptyCaptureException
is raised, which is translated as empty capture-rect by Ranger
when opts contain {"no_empty": false} (default).
4.1.4.5. Expansions¶
Captured-rects (“values”) may be limited due to empty-cell in the 1st
row/column traversed. To overcome this, the xl-ref may specify expansions
directions using a 3rd :-section like that:
_5(L):1_(UR):RDL1U1
This particular case means:
“Try expanding Right and Down repeatedly and then try once Left and Up.”
Expansion happens on a row-by-row or column-by-column basis, and terminates when a full empty(or non-empty) line is met.
Example-refs are given below for capturing the 2 marked tables:
A B C D E F G
1
┌───────────┐
│┌─────────┐│
2 ││ 1 X X ││
││ ││
3 ││X X X X││
││ ││
4 ││X X X 2 X││
││ ││
5 ││X X X X││
└┼─────────┼┴──► A1(RD):..(RD):DRL1
6 │X │
└─────────┴───► A1(RD):..(RD):L1DR A_(UR):^^(RD)
7 X
- The 'X' signify non-empty cells.
- The '1' and '2' signify the identified target-cells.
4.1.5. Plugin Extensions¶
The xleash library already uses setuptools entry-points
to attach backends and pandas filters.
Read init_plugins() to learn how to implement other plugins.
4.1.6. API¶
User-facing higher-level functionality:
Lasso(xl_ref, url_file, sh_name, st_edge, …)All the fields used by the algorithm, populated stage-by-stage by Ranger.lasso(xlref[, sheets_factory, base_opts, …])High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a Ranger.Ranger(sheets_factory[, base_opts, …])The director-class that performs all stages required for “throwing the lasso” around rect-values. Ranger.do_lasso(xlref, **context_kwds)The director-method that does all the job of hrowing a lasso around spreadsheet’s rect-regions according to xl-ref. make_default_Ranger([sheets_factory, …])Makes a defaulted Ranger.get_default_opts([overrides])Default opts used by lasso()when constructing its internalRanger.Related to capturing algorithm:
resolve_capture_rect(states_matrix, …[, …])Performs targeting, capturing and expansions based on the states-matrix. coords2Cell(row, col)Make A1 Cellfrom resolved coords, with rudimentary error-checking.EmptyCaptureExceptionThrown when targeting fails. xlwings_dims_call_spec()A list call-spec for _redim_filter()filter that imitates results of xlwings library.Related to parsing and basic structure used throughout: .. currentmodule:: pandalone.xleash._parse .. autosummary:
parse_xlref parse_expansion_moves parse_call_spec Cell Coords Edge
IO back-end functionality:
backend.SheetsFactory([backends])A caching-store of ABCSheetinstances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.backend.ABCBackendA plugin for a backend must implement and add instances into io_backends.backend.ABCBackend.open_sheet(wb_url, sheet_id)Open a ABCSheetsubclass, if backend has won the bid.backend.ABCSheet.read_rect(st, nd)Fecth the actual values from the backend Excel-sheet. backend.ArraySheet(arr[, ids, ids])A sample ABCSheetmade out of 2D-list or numpy-arrays, for facilitating tests.backend.ABCSheetA delegating to backend factory and sheet-wrapper with utility methods. _xlrd.XlrdSheet(sheet, book_fname[, epoch1904])The xlrd workbook wrapper required by xleash library. _xlrd._open_sheet_by_name_or_index(…)param int or str or None sheet_id: Plugin related .. autosummary:
_init_plugins _plugins_installed _PLUGIN_GROUP_NAME io_backends installed_filters
-
pandalone.xleash._init_plugins(plugin_group_name='pandalone.xleash.plugins')[source]¶ Discover and load plugins.
The xleash library already uses setuptools entry-points to attach backend
Sheetand pandasfilters.You may re-invoke after some
pip install <some-xleash-plugin>.##
setup.pyconfigurationsTo implement a new plugin, you have to package your code as a regular python distribution and add the following declaration inside its
setup.py:setup( # ... entry_points = { 'pandalone.xleash.plugins': [ 'plugin_1 = <foo.plugin.module>:<plugin-install-func> ## Load & install. 'plugin_2 = <bar.plugin.module> ## Load only. ] } )
## Implementing a plugin
The plugins are initialized during import time in a 2-stage procedure by
init_plugins(). A plugin is loaded and optionally installed if the setup-configuration above specifies a no-args<plugin-install-func>callable. Any collected<plugin-install-func>callables are invoked AFTER all plugin-modules have finished loading.Tip
For example, study this project how it sets backend and filters.
Warning
When appending into “hook” lists during installation, remember to avoid re-inserting duplicate items. In general try to well-behave even when plugins are initialized multiple times!
-
pandalone.xleash._PLUGIN_GROUP_NAME= 'pandalone.xleash.plugins'¶ Used to discover setuptools extension-points.
-
pandalone.xleash.resolve_capture_rect(states_matrix, up_dn_margins, st_edge, nd_edge=None, exp_moves=None, base_coords=None)[source]¶ Performs targeting, capturing and expansions based on the states-matrix.
To get the margin_coords, use one of:
Its results can be fed into
read_capture_values().Parameters: - states_matrix (np.ndarray) – A 2D-array with
Falsewherever cell are blank or empty. UseABCSheet.get_states_matrix()to derrive it. - Coords) up_dn_margins ((Coords,) – the top-left/bottom-right coords with full-cells
- st_edge (Edge) – “uncooked” as matched by regex
- nd_edge (Edge) – “uncooked” as matched by regex
- or none exp_moves (list) – Just the parsed string, and not
None. - base_coords (Coords) – The base for a dependent 1st edge.
Returns: a
(Coords, Coords)with the 1st and 2nd capture-cell ordered from top-left –> bottom-right.Return type: Raises: EmptyCaptureException – When targeting failed, and no target cell identified.
- Examples::
>>> from pandalone.xleash import Edge, margin_coords_from_states_matrix
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ], dtype=bool) >>> up, dn = margin_coords_from_states_matrix(states_matrix)
>>> st_edge = Edge(Cell('1', 'A'), 'DR') >>> nd_edge = Edge(Cell('.', '.'), 'DR') >>> resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) (Coords(row=3, col=2), Coords(row=4, col=2))
Using dependenent coordinates for the 2nd edge:
>>> st_edge = Edge(Cell('_', '_'), None) >>> nd_edge = Edge(Cell('.', '.'), 'UL') >>> rect = resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) >>> rect (Coords(row=2, col=2), Coords(row=4, col=5))
Using sheet’s margins:
>>> st_edge = Edge(Cell('^', '_'), None) >>> nd_edge = Edge(Cell('_', '^'), None) >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
Walking backwards:
>>> st_edge = Edge(Cell('^', '_'), 'L') # Landing is full, so 'L' ignored. >>> nd_edge = Edge(Cell('_', '_'), 'L', '+') # '+' or would also stop. >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
- states_matrix (np.ndarray) – A 2D-array with
-
class
pandalone.xleash.ABCSheet[source]¶ Bases:
abc.ABCA delegating to backend factory and sheet-wrapper with utility methods.
Parameters: - _states_matrix (np.ndarray) – The states-matrix cached, so recreate object to refresh it.
- _margin_coords (dict) – limits used by
_resolve_cell(), cached, so recreate object to refresh it.
Resource management is outside of the scope of this class, and must happen in the backend workbook/sheet instance.
xlrd examples:
>>> import xlrd >>> with xlrd.open_workbook(self.tmp) as wb: ... sheet = xleash.xlrdSheet(wb.sheet_by_name('Sheet1')) ... ## Do whatever
win32 examples:
>>> with dsgdsdsfsd as wb: ... sheet = xleash.win32Sheet(wb.sheet['Sheet1']) TODO: Win32 Sheet example
-
_read_margin_coords()[source]¶ Override if possible to read (any of the) limits directly from the sheet.
Returns: the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None).Return type: (Coords, Coords) Raise: EmptyCaptureException if sheet empty
-
_read_states_matrix()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with Falsewherever cell are blank or empty.Return type: ndarray
-
get_margin_coords()[source]¶ Extract (and cache) margins either internally or from
margin_coords_from_states_matrix().Returns: the resolved top-left and bottom-right xleash.CoordsReturn type: tuple Raise: EmptyCaptureException if sheet empty
-
get_sheet_ids()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
get_states_matrix()[source]¶ Read and cache the states-matrix of the wrapped sheet.
Returns: A 2D-array with Falsewherever cell are blank or empty.Return type: ndarray Raise: EmptyCaptureException if sheet empty
-
read_rect(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
class
pandalone.xleash.ArraySheet(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Bases:
pandalone.xleash.io.backend.ABCSheetA sample
ABCSheetmade out of 2D-list or numpy-arrays, for facilitating tests.-
__init__(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_read_states_matrix()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with Falsewherever cell are blank or empty.Return type: ndarray
-
get_sheet_ids()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
read_rect(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
-
pandalone.xleash.coords2Cell(row, col)[source]¶ Make A1
Cellfrom resolved coords, with rudimentary error-checking.Examples:
>>> coords2Cell(row=0, col=0) Cell(row='1', col='A') >>> coords2Cell(row=0, col=26) Cell(row='1', col='AA') >>> coords2Cell(row=10, col='.') Cell(row='11', col='.') >>> coords2Cell(row=-3, col=-2) Traceback (most recent call last): AssertionError: negative row!
-
exception
pandalone.xleash.EmptyCaptureException[source]¶ Bases:
ExceptionThrown when targeting fails.
-
pandalone.xleash.margin_coords_from_states_matrix(states_matrix)[source]¶ Returns top-left/bottom-down margins of full cells from a state matrix.
May be used by
ABCSheet.get_margin_coords()if a backend does not report the sheet-margins internally.Parameters: states_matrix (np.ndarray) – A 2D-array with Falsewherever cell are blank or empty. UseABCSheet.get_states_matrix()to derrive it.Returns: the 2 coords of the top-left & bottom-right full cells Return type: (Coords, Coords) - Examples::
>>> states_matrix = np.asarray([ ... [0, 0, 0], ... [0, 1, 0], ... [0, 1, 1], ... [0, 0, 1], ... ]) >>> margins = margin_coords_from_states_matrix(states_matrix) >>> margins (Coords(row=1, col=1), Coords(row=3, col=2))
Note that the botom-left cell is not the same as
states_matrixmatrix size:>>> states_matrix = np.asarray([ ... [0, 0, 0, 0], ... [0, 1, 0, 0], ... [0, 1, 1, 0], ... [0, 0, 1, 0], ... [0, 0, 0, 0], ... ]) >>> margin_coords_from_states_matrix(states_matrix) == margins True
-
pandalone.xleash.lasso(xlref, sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds)[source]¶ High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a
Ranger.Parameters: - xlref (str) –
a string with the xl-ref format:
<url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
i.e.:
file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
the new
SheetsFactorycreated is closed afterwards. Delegated tomake_default_Ranger(), so items override default ones; use a newRangerif that is not desired. - available_filters (dict or None) – Delegated to
make_default_Ranger(), so items override default ones; use a newRangerif that is not desired. - return_lasso (bool) –
If
True, values are contained in the returned Lasso instance, along with all other artifacts of the lassoing procedure.For more debugging help, create a
Rangeyourself and inspect theRanger.intermediate_lasso. - context_kwds (Lasso) – Default
Lassofields in case parsed ones areNone(i.e. you can specify the sheet like that).
Variables: base_opts – Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every
Ranger.do_lasso(), whether invoked directly or recursively byrecursive_filter(). Read the code to be sure what are the available choices. Delegated tomake_default_Ranger(), so items override default ones; use a newRangerif that is not desired.Returns: Either the captured & filtered values or the final
Lasso, depending on thereturn_lassosarg.Example:
sheet = _
- xlref (str) –
-
class
pandalone.xleash.Ranger(sheets_factory, base_opts=None, available_filters=None)[source]¶ Bases:
objectThe director-class that performs all stages required for “throwing the lasso” around rect-values.
Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.
The
do_lasso()does the job.Variables: - sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
None, butdo_lasso()will scream unless invoked with acontext_lassoarg containing a concreteABCSheet. - base_opts (dict) – The opts that are deep-copied and used as the defaults
for every
do_lasso(), whether invoked directly or recursively byrecursive_filter(). If unspecified, no opts are used, but this attr is set to an empty dict. Seeget_default_opts(). - or None available_filters (dict) – The filters available for a xl-ref to use.
If
None, then usesxleash.installed_filters. Use an empty dict not to use any filters. - intermediate_lasso (Lasso) – A
('stage', Lasso)pair with the lastLassoinstance produced during the last execution of thedo_lasso(). Used for inspecting/debuging.
-
__init__(sheets_factory, base_opts=None, available_filters=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_make_init_Lasso(**context_kwds)[source]¶ Creates the lasso to be used for each new
do_lasso()invocation.
-
_parse_and_merge_with_context(xlref, init_lasso)[source]¶ Merges xl-ref parsed-parsed_fields with
init_lasso, reporting any errors.Parameters: init_lasso (Lasso) – Default values to be overridden by non-nulls. Returns: a Lasso with any non Noneparsed-fields updated
-
_resolve_capture_rect(lasso, sheet)[source]¶ Also handles
EmptyCaptureExceptionin caseopts['no_empty'] != False.
- sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
-
class
pandalone.xleash.SheetsFactory(backends=None)[source]¶ Bases:
pandalone.xleash.io.backend.SimpleSheetsFactoryA caching-store of
ABCSheetinstances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.Variables: _cached_sheets (dict) – A cache of all _Spreadsheets accessed so far, keyed by multiple keys generated by _derive_sheet_keys().- To avoid opening non-trivial workbooks, use the
add_sheet()to pre-populate this cache with them. - It is a resource-manager for contained sheets, so it can be used wth
a
withstatement.
-
__init__(backends=None)[source]¶ Parameters: backends – The list of backendsto consider when opening sheets. If it evaluates to false,io_backendsassumed.Typ backends: list or None
-
_derive_sheet_keys(sheet, wb_ids=None, sh_ids=None)[source]¶ Retuns the product of user-specified and sheet-internal keys.
Parameters: - wb_ids – a single or a sequence of extra workbook-ids (ie: file, url)
- sh_ids – a single or sequence of extra sheet-ids (ie: name, index, None)
- To avoid opening non-trivial workbooks, use the
-
pandalone.xleash.io_backends= [<pandalone.xleash.io._xlrd.XlrdBackend object>]¶ Hook for plugins to append
ABCBackendinstances.
-
pandalone.xleash.make_default_Ranger(sheets_factory=None, base_opts=None, available_filters=None)[source]¶ Makes a defaulted
Ranger.Parameters: - sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
SheetsFactoryis created. Remember to invoke itsSheetsFactory.close()to clear resources from any opened sheets. - base_opts (dict or None) –
Default opts to affect the lassoing, to be merged with defaults; uses
get_default_opts().Read the code to be sure what are the available choices :-(.
- available_filters (dict or None) – The filters available for a xl-ref to use.
(
xleash.installed_filtersused if unspecified).
For instance, to make you own sheets-factory and override options, yoummay do this:
>>> from pandalone import xleash >>> with xleash.SheetsFactory() as sf: ... xleash.make_default_Ranger(sf, base_opts={'lax': True}) <pandalone.xleash._lasso.Ranger object at ...
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
-
class
pandalone.xleash.XLocation(sheet, st, nd, base_coords)¶ Bases:
tupleFields denoting the position of a sheet/cell while running a element-wise-filter.
Practically func:
run_filter_elementwise() preserves these fields if the processed ones were `None.-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, sheet, st, nd, base_coords)¶ Create new instance of XLocation(sheet, st, nd, base_coords)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new XLocation object from a sequence or iterable
-
_replace(**kwds)¶ Return a new XLocation object replacing specified fields with new values
-
base_coords¶ Alias for field number 3
-
nd¶ Alias for field number 2
-
sheet¶ Alias for field number 0
-
st¶ Alias for field number 1
-
-
pandalone.xleash.get_default_opts(overrides=None)[source]¶ Default opts used by
lasso()when constructing its internalRanger.Parameters: or None overrides (dict) – Any items to update the default ones.
-
pandalone.xleash.installed_filters= {'df': {'func': <function _df_filter>}, 'dict': {'desc': "dict() -> new empty dictionary\ndict(mapping) -> new dictionary initialized from a mapping object's\n (key, value) pairs\ndict(iterable) -> new dictionary initialized as if via:\n d = {}\n for k, v in iterable:\n d[k] = v\ndict(**kwargs) -> new dictionary initialized with the name=value pairs\n in the keyword argument list. For example: dict(one=1, two=2)", 'func': <function install_default_filters.<locals>.<lambda>>}, 'numpy': {'desc': "array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)\n\n Create an array.\n\n Parameters\n ----------\n object : array_like\n An array, any object exposing the array interface, an object whose\n __array__ method returns an array, or any (nested) sequence.\n dtype : data-type, optional\n The desired data-type for the array. If not given, then the type will\n be determined as the minimum type required to hold the objects in the\n sequence.\n copy : bool, optional\n If true (default), then the object is copied. Otherwise, a copy will\n only be made if __array__ returns a copy, if obj is a nested sequence,\n or if a copy is needed to satisfy any of the other requirements\n (`dtype`, `order`, etc.).\n order : {'K', 'A', 'C', 'F'}, optional\n Specify the memory layout of the array. If object is not an array, the\n newly created array will be in C order (row major) unless 'F' is\n specified, in which case it will be in Fortran order (column major).\n If object is an array the following holds.\n\n ===== ========= ===================================================\n order no copy copy=True\n ===== ========= ===================================================\n 'K' unchanged F & C order preserved, otherwise most similar order\n 'A' unchanged F order if input is F and not C, otherwise C order\n 'C' C order C order\n 'F' F order F order\n ===== ========= ===================================================\n\n When ``copy=False`` and a copy is made for other reasons, the result is\n the same as if ``copy=True``, with some exceptions for `A`, see the\n Notes section. The default order is 'K'.\n subok : bool, optional\n If True, then sub-classes will be passed-through, otherwise\n the returned array will be forced to be a base-class array (default).\n ndmin : int, optional\n Specifies the minimum number of dimensions that the resulting\n array should have. Ones will be pre-pended to the shape as\n needed to meet this requirement.\n\n Returns\n -------\n out : ndarray\n An array object satisfying the specified requirements.\n\n See Also\n --------\n empty_like : Return an empty array with shape and type of input.\n ones_like : Return an array of ones with shape and type of input.\n zeros_like : Return an array of zeros with shape and type of input.\n full_like : Return a new array with shape of input filled with value.\n empty : Return a new uninitialized array.\n ones : Return a new array setting values to one.\n zeros : Return a new array setting values to zero.\n full : Return a new array of given shape filled with value.\n\n\n Notes\n -----\n When order is 'A' and `object` is an array in neither 'C' nor 'F' order,\n and a copy is forced by a change in dtype, then the order of the result is\n not necessarily 'C' as expected. This is likely a bug.\n\n Examples\n --------\n >>> np.array([1, 2, 3])\n array([1, 2, 3])\n\n Upcasting:\n\n >>> np.array([1, 2, 3.0])\n array([ 1., 2., 3.])\n\n More than one dimension:\n\n >>> np.array([[1, 2], [3, 4]])\n array([[1, 2],\n [3, 4]])\n\n Minimum dimensions 2:\n\n >>> np.array([1, 2, 3], ndmin=2)\n array([[1, 2, 3]])\n\n Type provided:\n\n >>> np.array([1, 2, 3], dtype=complex)\n array([ 1.+0.j, 2.+0.j, 3.+0.j])\n\n Data-type consisting of more than one element:\n\n >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])\n >>> x['a']\n array([1, 3])\n\n Creating an array from sub-classes:\n\n >>> np.array(np.mat('1 2; 3 4'))\n array([[1, 2],\n [3, 4]])\n\n >>> np.array(np.mat('1 2; 3 4'), subok=True)\n matrix([[1, 2],\n [3, 4]])", 'func': <function install_default_filters.<locals>.<lambda>>}, 'odict': {'desc': 'Dictionary that remembers insertion order', 'func': <function install_default_filters.<locals>.<lambda>>}, 'pipe': {'func': <function pipe_filter>}, 'py': {'func': <function py_filter>}, 'pyeval': {'func': <function pyeval_filter>}, 'recurse': {'func': <function recursive_filter>}, 'redim': {'func': <function redim_filter>}, 'sorted': {'desc': 'Return a new list containing all items from the iterable in ascending order.\n\nA custom key function can be supplied to customize the sort order, and the\nreverse flag can be set to request the result in descending order.', 'func': <function install_default_filters.<locals>.<lambda>>}, 'sr': {'desc': 'Converts a 2-columns list-of-lists into pd.Series.\n\n One-dimensional ndarray with axis labels (including time series).\n\n Labels need not be unique but must be a hashable type. The object\n supports both integer- and label-based indexing and provides a host of\n methods for performing operations involving the index. Statistical\n methods from ndarray have been overridden to automatically exclude\n missing data (currently represented as NaN).\n\n Operations between Series (+, -, /, *, **) align values based on their\n associated index values-- they need not be the same length. The result\n index will be the sorted union of the two indexes.\n\n Parameters\n ----------\n data : array-like, Iterable, dict, or scalar value\n Contains data stored in Series.\n\n .. versionchanged :: 0.23.0\n If data is a dict, argument order is maintained for Python 3.6\n and later.\n\n index : array-like or Index (1d)\n Values must be hashable and have the same length as `data`.\n Non-unique index values are allowed. Will default to\n RangeIndex (0, 1, 2, ..., n) if not provided. If both a dict and index\n sequence are used, the index will override the keys found in the\n dict.\n dtype : str, numpy.dtype, or ExtensionDtype, optional\n Data type for the output Series. If not specified, this will be\n inferred from `data`.\n See the :ref:`user guide <basics.dtypes>` for more usages.\n copy : bool, default False\n Copy input data.\n ', 'func': <function install_filters.<locals>.<lambda>>}}¶ Hook for plugins to append filters.
-
class
pandalone.xleash.Lasso(xl_ref, url_file, sh_name, st_edge, nd_edge, exp_moves, call_spec, sheet, st, nd, values, base_coords, opts)¶ Bases:
tupleAll the fields used by the algorithm, populated stage-by-stage by
Ranger.Parameters: - xl_ref (str) – The full url, populated on parsing.
- sh_name (str) –
Parsed sheet name (or index, but still as string), populated on parsing.
Note
If you need the name of the captured sheet, use:
lasso.sheet.get_sheet_ids().ids[0]
- st_edge (Edge) – The 1st edge, populated on parsing.
- nd_edge (Edge) – The 2nd edge, populated on parsing.
- st (Coords) – The top-left targeted coords of the capture-rect, populated on capturing.`
- nd (Coords) – The bottom-right targeted coords of the capture-rect, populated on capturing
- sheet (ABCSheet) – The fetched from factory or ranger’s current sheet, populated after capturing before reading.
- values – The excel’s table-values captured by the lasso, populated after reading updated while applying filters.
- call_spec – The call-spec derrived from the parsed filters, to be fed
into
Ranger.make_call(). - base_coords (Coords) – On recursive calls it becomes the base-cell for the 1st edge.
- or ChainMap opts (dict) –
- Before
parsing, they are just any ‘opts’ dict found in the filters. - After parsing, a 2-map ChainMap with :attr:`Ranger.base_opts` and options extracted from *filters on top.
- Before
-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, xl_ref=None, url_file=None, sh_name=None, st_edge=None, nd_edge=None, exp_moves=None, call_spec=None, sheet=None, st=None, nd=None, values=None, base_coords=None, opts=None)¶ Create new instance of Lasso(xl_ref, url_file, sh_name, st_edge, nd_edge, exp_moves, call_spec, sheet, st, nd, values, base_coords, opts)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new Lasso object from a sequence or iterable
-
_replace(**kwds)¶ Return a new Lasso object replacing specified fields with new values
-
base_coords¶ Alias for field number 11
-
call_spec¶ Alias for field number 6
-
exp_moves¶ Alias for field number 5
-
nd¶ Alias for field number 9
-
nd_edge¶ Alias for field number 4
-
opts¶ Alias for field number 12
-
sh_name¶ Alias for field number 2
-
sheet¶ Alias for field number 7
-
st¶ Alias for field number 8
-
st_edge¶ Alias for field number 3
-
url_file¶ Alias for field number 1
-
values¶ Alias for field number 10
-
xl_ref¶ Alias for field number 0
-
pandalone.xleash.xlwings_dims_call_spec()[source]¶ A list call-spec for
_redim_filter()filter that imitates results of xlwings library.
-
class
pandalone.xleash.Cell[source]¶ Bases:
pandalone.xleash._parse.CellA pair of 1-based strings, denoting the “A1” coordinates of a cell.
The “num” coords (numeric, 0-based) are specified using numpy-arrays (
Coords).
-
class
pandalone.xleash.Coords(row, col)¶ Bases:
tupleA pair of 0-based integers denoting the “num” coordinates of a cell.
The “A1” coords (1-based coordinates) are specified using
Cell.-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, row, col)¶ Create new instance of Coords(row, col)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new Coords object from a sequence or iterable
-
_replace(**kwds)¶ Return a new Coords object replacing specified fields with new values
-
col¶ Alias for field number 1
-
row¶ Alias for field number 0
-
-
class
pandalone.xleash.Edge[source]¶ Bases:
pandalone.xleash._parse.EdgeAll the infos required to target a cell.
An Edge contains A1
Cellasland.Parameters: - land (Cell) – the landing-cell
- mov (str) – use None for missing moves.
- mod (str) – one of (
+,-orNone)
-
class
pandalone.xleash.CallSpec(func, args, kwds)¶ Bases:
tupleThe call-specifier for holding the parsed json-filters.
-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, func, args=[], kwds={})¶ Create new instance of CallSpec(func, args, kwds)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new CallSpec object from a sequence or iterable
-
_replace(**kwds)¶ Return a new CallSpec object replacing specified fields with new values
-
args¶ Alias for field number 1
-
func¶ Alias for field number 0
-
kwds¶ Alias for field number 2
-
4.1.7. Submodule: pandalone.xleash._parse¶
The syntax-parsing part xleash.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash._parse.CallSpec(func, args, kwds)¶ Bases:
tupleThe call-specifier for holding the parsed json-filters.
-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, func, args=[], kwds={})¶ Create new instance of CallSpec(func, args, kwds)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new CallSpec object from a sequence or iterable
-
_replace(**kwds)¶ Return a new CallSpec object replacing specified fields with new values
-
args¶ Alias for field number 1
-
func¶ Alias for field number 0
-
kwds¶ Alias for field number 2
-
-
class
pandalone.xleash._parse.Cell[source]¶ Bases:
pandalone.xleash._parse.CellA pair of 1-based strings, denoting the “A1” coordinates of a cell.
The “num” coords (numeric, 0-based) are specified using numpy-arrays (
Coords).
-
class
pandalone.xleash._parse.Edge[source]¶ Bases:
pandalone.xleash._parse.EdgeAll the infos required to target a cell.
An Edge contains A1
Cellasland.Parameters: - land (Cell) – the landing-cell
- mov (str) – use None for missing moves.
- mod (str) – one of (
+,-orNone)
-
pandalone.xleash._parse.Edge_new(row, col, mov=None, mod=None, default=None)[source]¶ Make a new
Edgefrom any non-values supplied, as is capitalized, or nothing.Parameters: Returns: a
Edgeif any non-NoneReturn type: Examples:
>>> Edge_new('1', 'a', 'Rul', '-') Edge(land=Cell(row='1', col='A'), mov='RUL', mod='-') >>> print(Edge_new('5', '5')) R5C5
No error checking performed:
>>> Edge_new('Any', 'foo', 'BaR', '+_&%') Edge(land=Cell(row='ANY', col='FOO'), mov='BAR', mod='+_&%') >>> print(Edge_new(None, None, None, None)) None
except were coincidental:
>>> Edge_new(row=0, col=123, mov='BAR', mod=None) Traceback (most recent call last): AttributeError: 'int' object has no attribute 'upper' >>> Edge_new(row=0, col='A', mov=123, mod=None) Traceback (most recent call last): AttributeError: 'int' object has no attribute 'upper'
-
pandalone.xleash._parse._excel_str_translator= {8220: 34, 8221: 34}¶ Excel use these !@#% chars for double-quotes, which are not valid JSON-strings!!
-
pandalone.xleash._parse._parse_xlref(xlref)[source]¶ Parse a xl-ref into a dict.
Parameters: xlref (str) – A url-string abiding to the xl-ref syntax. Returns: A dict with all fields, with None with those missing. Return type: dict Examples:
>>> res = parse_xlref('workbook.xlsx#Sheet1!A1(DR+):Z20(UL):L1U2R1D1:' ... '{"opts":{}, "func": "foo"}') >>> sorted(res.items()) [('call_spec', CallSpec(func='foo', args=[], kwds={})), ('exp_moves', 'L1U2R1D1'), ('nd_edge', Edge(land=Cell(row='20', col='Z'), mov='UL', mod=None)), ('opts', {}), ('sh_name', 'Sheet1'), ('st_edge', Edge(land=Cell(row='1', col='A'), mov='DR', mod='+')), ('url_file', 'workbook.xlsx'), ('xl_ref', 'workbook.xlsx#Sheet1!A1(DR+):Z20(UL):L1U2R1D1:{"opts":{}, "func": "foo"}')]
Shortcut for all sheet from top-left to bottom-right full-cells:
>>> res=parse_xlref('#:') >>> sorted(res.items()) [('call_spec', None), ('exp_moves', None), ('nd_edge', Edge(land=Cell(row='_', col='_'), mov=None, mod=None)), ('opts', None), ('sh_name', None), ('st_edge', Edge(land=Cell(row='^', col='^'), mov=None, mod=None)), ('url_file', None), ('xl_ref', '#:')]
Errors:
>>> parse_xlref('A1(DR)Z20(UL)') Traceback (most recent call last): SyntaxError: No fragment-part (starting with '#'): A1(DR)Z20(UL) >>> parse_xlref('#A1(DR)Z20(UL)') ## Missing ':'. Traceback (most recent call last): SyntaxError: Not an `xl-ref` syntax: A1(DR)Z20(UL)
But as soon as syntax is matched, subsequent errors raised are
ValueErrors:>>> parse_xlref("#A1:B1:{'Bad_JSON_str'}") Traceback (most recent call last): ValueError: Filters are not valid JSON: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) JSON: {'Bad_JSON_str'}
-
pandalone.xleash._parse._regular_xlref_regex= re.compile('\n ^\\s*(?:(?P<sh_name>[^!]+)?!)? # xl sheet name\n (?: # 1st-edge\n (?:\n (?:\n , re.IGNORECASE|re.DOTALL|re.VERBOSE)¶ The regex for parsing regular xl-ref.
-
pandalone.xleash._parse._repeat_moves(moves, times=None)[source]¶ Returns an iterator that repeats
movesxtimes, or infinite if unspecified.Used when parsing primitive directions.
Parameters: Returns: An iterator of the moves
Return type: iterator
Examples:
>>> list(_repeat_moves('LUR', '3')) ['LUR', 'LUR', 'LUR'] >>> list(_repeat_moves('ABC', '0')) [] >>> _repeat_moves('ABC') ## infinite repetitions repeat('ABC')
-
pandalone.xleash._parse.parse_call_spec(call_spec_values)[source]¶ Parse call-specifier from json-filters.
Parameters: call_spec_values – This is a non-null structure specifying some function call in the
filterpart, which it can be either:- string:
"func_name" - list:
["func_name", ["arg1", "arg2"], {"k1": "v1"}]where the last 2 parts are optional and can be given in any order; - object:
{"func": "func_name", "args": ["arg1"], "kwds": {"k":"v"}}where theargsandkwdsare optional.
Returns: the 3-tuple func, args=(), kwds={}with the defaults as shown when missing.- string:
-
pandalone.xleash._parse.parse_expansion_moves(exp_moves)[source]¶ Parse rect-expansion into a list of dir-letters iterables.
Parameters: exp_moves – A string with a sequence of primitive moves: es. L1U1R1D1 Returns: A list of primitive-dir chains. Return type: list Examples:
>>> res = parse_expansion_moves('lu1urd?') >>> res [repeat('L'), repeat('U', 1), repeat('UR'), repeat('D', 1)] # infinite generator >>> [next(res[0]) for i in range(10)] ['L', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'L'] >>> list(res[1]) ['U'] >>> parse_expansion_moves('1LURD') Traceback (most recent call last): ValueError: Invalid rect-expansion(1LURD) due to: 'NoneType' object has no attribute 'groupdict'
-
pandalone.xleash._parse.parse_xlref(xlref)[source]¶ Like
_parse_xlref()but tries also ifxlreafis encased by delimiter chars/\"$%&.See also
_encase_regex
-
pandalone.xleash._parse.parse_xlref_fragment(xlref_fragment)[source]¶ Parses a xl-ref fragment, anything to the left of the hash(
#).Parameters: xlref_fragment (str) – the url-fragment part of the xl-ref string, including the '#'char.Returns: dictionary containing the following parameters: - sheet: (str, int, None) i.e.
sheet_name - st_edge: (Edge, None) the 1st-ref, with raw cell
i.e.
Edge(land=Cell(row='8', col='UPT'), mov='LU', mod='-') - nd_edge: (Edge, None) the 2nd-ref, with raw cell
i.e.
Edge(land=Cell(row='_', col='.'), mov='D', mod='+') - exp_moves: (sequence, None), as i.e.
LDL1parsed byparse_expansion_moves() - js_filt: dict i.e.
{"dims: 1}
Return type: dict Examples:
>>> res = parse_xlref_fragment('Sheet1!A1(DR+):Z20(UL):L1U2R1D1:' ... '{"opts":{}, "func": "foo"}') >>> sorted(res.items()) [('call_spec', CallSpec(func='foo', args=[], kwds={})), ('exp_moves', 'L1U2R1D1'), ('nd_edge', Edge(land=Cell(row='20', col='Z'), mov='UL', mod=None)), ('opts', {}), ('sh_name', 'Sheet1'), ('st_edge', Edge(land=Cell(row='1', col='A'), mov='DR', mod='+'))]
Shortcut for all sheet from top-left to bottom-right full-cells:
>>> res = parse_xlref_fragment(':') >>> sorted(res.items()) [('call_spec', None), ('exp_moves', None), ('nd_edge', Edge(land=Cell(row='_', col='_'), mov=None, mod=None)), ('opts', None), ('sh_name', None), ('st_edge', Edge(land=Cell(row='^', col='^'), mov=None, mod=None))]
Errors:
>>> parse_xlref_fragment('A1(DR)Z20(UL)') Traceback (most recent call last): SyntaxError: Not an `xl-ref` syntax: A1(DR)Z20(UL)
- sheet: (str, int, None) i.e.
4.1.8. Submodule: pandalone.xleash.io¶
Backends for opening sheets from various sources.
4.1.9. Submodule: pandalone.xleash.io.backend¶
The manager and the base for all backends fetching cells from actual workbooks and sheets.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash.io.backend.ABCBackend[source]¶ Bases:
abc.ABCA plugin for a backend must implement and add instances into
io_backends.
-
class
pandalone.xleash.io.backend.ABCSheet[source]¶ Bases:
abc.ABCA delegating to backend factory and sheet-wrapper with utility methods.
Parameters: - _states_matrix (np.ndarray) – The states-matrix cached, so recreate object to refresh it.
- _margin_coords (dict) – limits used by
_resolve_cell(), cached, so recreate object to refresh it.
Resource management is outside of the scope of this class, and must happen in the backend workbook/sheet instance.
xlrd examples:
>>> import xlrd >>> with xlrd.open_workbook(self.tmp) as wb: ... sheet = xleash.xlrdSheet(wb.sheet_by_name('Sheet1')) ... ## Do whatever
win32 examples:
>>> with dsgdsdsfsd as wb: ... sheet = xleash.win32Sheet(wb.sheet['Sheet1']) TODO: Win32 Sheet example
-
_read_margin_coords()[source]¶ Override if possible to read (any of the) limits directly from the sheet.
Returns: the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None).Return type: (Coords, Coords) Raise: EmptyCaptureException if sheet empty
-
_read_states_matrix()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with Falsewherever cell are blank or empty.Return type: ndarray
-
get_margin_coords()[source]¶ Extract (and cache) margins either internally or from
margin_coords_from_states_matrix().Returns: the resolved top-left and bottom-right xleash.CoordsReturn type: tuple Raise: EmptyCaptureException if sheet empty
-
get_sheet_ids()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
get_states_matrix()[source]¶ Read and cache the states-matrix of the wrapped sheet.
Returns: A 2D-array with Falsewherever cell are blank or empty.Return type: ndarray Raise: EmptyCaptureException if sheet empty
-
read_rect(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
class
pandalone.xleash.io.backend.ArraySheet(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Bases:
pandalone.xleash.io.backend.ABCSheetA sample
ABCSheetmade out of 2D-list or numpy-arrays, for facilitating tests.-
__init__(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_read_states_matrix()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with Falsewherever cell are blank or empty.Return type: ndarray
-
get_sheet_ids()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
read_rect(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
-
class
pandalone.xleash.io.backend.SheetId(book, ids)¶ Bases:
tuple-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, book, ids)¶ Create new instance of SheetId(book, ids)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new SheetId object from a sequence or iterable
-
_replace(**kwds)¶ Return a new SheetId object replacing specified fields with new values
-
book¶ Alias for field number 0
-
ids¶ Alias for field number 1
-
-
class
pandalone.xleash.io.backend.SheetsFactory(backends=None)[source]¶ Bases:
pandalone.xleash.io.backend.SimpleSheetsFactoryA caching-store of
ABCSheetinstances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.Variables: _cached_sheets (dict) – A cache of all _Spreadsheets accessed so far, keyed by multiple keys generated by _derive_sheet_keys().- To avoid opening non-trivial workbooks, use the
add_sheet()to pre-populate this cache with them. - It is a resource-manager for contained sheets, so it can be used wth
a
withstatement.
-
__init__(backends=None)[source]¶ Parameters: backends – The list of backendsto consider when opening sheets. If it evaluates to false,io_backendsassumed.Typ backends: list or None
-
_derive_sheet_keys(sheet, wb_ids=None, sh_ids=None)[source]¶ Retuns the product of user-specified and sheet-internal keys.
Parameters: - wb_ids – a single or a sequence of extra workbook-ids (ie: file, url)
- sh_ids – a single or sequence of extra sheet-ids (ie: name, index, None)
- To avoid opening non-trivial workbooks, use the
-
class
pandalone.xleash.io.backend.SimpleSheetsFactory(backends=None)[source]¶ Bases:
objectAsks backends to bid for creating
ABCSheetinstances - client should handle resources.Backends are taken from
io_backendsor specified during construction.
-
pandalone.xleash.io.backend.margin_coords_from_states_matrix(states_matrix)[source]¶ Returns top-left/bottom-down margins of full cells from a state matrix.
May be used by
ABCSheet.get_margin_coords()if a backend does not report the sheet-margins internally.Parameters: states_matrix (np.ndarray) – A 2D-array with Falsewherever cell are blank or empty. UseABCSheet.get_states_matrix()to derrive it.Returns: the 2 coords of the top-left & bottom-right full cells Return type: (Coords, Coords) - Examples::
>>> states_matrix = np.asarray([ ... [0, 0, 0], ... [0, 1, 0], ... [0, 1, 1], ... [0, 0, 1], ... ]) >>> margins = margin_coords_from_states_matrix(states_matrix) >>> margins (Coords(row=1, col=1), Coords(row=3, col=2))
Note that the botom-left cell is not the same as
states_matrixmatrix size:>>> states_matrix = np.asarray([ ... [0, 0, 0, 0], ... [0, 1, 0, 0], ... [0, 1, 1, 0], ... [0, 0, 1, 0], ... [0, 0, 0, 0], ... ]) >>> margin_coords_from_states_matrix(states_matrix) == margins True
4.1.10. Submodule: pandalone.xleash.io._xlrd¶
Implements the xlrd backend of xleash that reads in-file Excel-spreadsheets.
-
class
pandalone.xleash.io._xlrd.XlrdBackend[source]¶
-
class
pandalone.xleash.io._xlrd.XlrdSheet(sheet, book_fname, epoch1904=False)[source]¶ Bases:
pandalone.xleash.io.backend.ABCSheetThe xlrd workbook wrapper required by xleash library.
-
__init__(sheet, book_fname, epoch1904=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_read_margin_coords()[source]¶ Override if possible to read (any of the) limits directly from the sheet.
Returns: the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None).Return type: (Coords, Coords) Raise: EmptyCaptureException if sheet empty
-
get_sheet_ids()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
-
pandalone.xleash.io._xlrd._open_sheet_by_name_or_index(xlrd_book, wb_id, sheet_id)[source]¶ Parameters: or str or None sheet_id (int) – If None, opens 1st sheet.
-
pandalone.xleash.io._xlrd._parse_cell(xcell, epoch1904=False)[source]¶ Parse a xl-xcell.
Parameters: - xcell (xlrd.sheet.Cell) – an excel xcell
- epoch1904 (bool) – Which date system was in force when this file was last saved. False => 1900 system (the Excel for Windows default). True => 1904 system (the Excel for Macintosh default).
Returns: formatted xcell value
Return type: int, float, datetime.datetime, bool, None, str, datetime.time, float(‘nan’)
Examples:
>>> import xlrd >>> from xlrd.sheet import Cell >>> _parse_cell(Cell(xlrd.XL_CELL_NUMBER, 1.2)) 1.2 >>> _parse_cell(Cell(xlrd.XL_CELL_DATE, 1.2)) datetime.datetime(1900, 1, 1, 4, 48) >>> _parse_cell(Cell(xlrd.XL_CELL_TEXT, 'hi')) 'hi'
4.1.11. Submodule: pandalone.xleash._capture¶
The algorithmic part of capturing.
Prefer accessing the public members from the parent module.
-
pandalone.xleash._capture.CHECK_CELLTYPE= False¶ When
True, most coord-functions accept any 2-tuples.
-
exception
pandalone.xleash._capture.EmptyCaptureException[source]¶ Bases:
ExceptionThrown when targeting fails.
-
pandalone.xleash._capture._col2num(coord)[source]¶ Resolves special coords or converts Excel A1 columns to a zero-based, reporting invalids.
Parameters: coord (str) – excel-column coordinate or one of ^_.Returns: excel column number, >= 0 Return type: int Examples:
>>> col = _col2num('D') >>> col 3 >>> _col2num('d') == col True >>> _col2num('AaZ') 727 >>> _col2num('10') 9 >>> _col2num(9) 8
Negatives (from left-end) are preserved:
>>> _col2num('AaZ') 727
Fails ugly:
>>> _col2num('%$') Traceback (most recent call last): ValueError: substring not found >>> _col2num([]) Traceback (most recent call last): TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
-
pandalone.xleash._capture._expand_rect(states_matrix, r1, r2, exp_moves)[source]¶ Applies the expansion-moves based on the
states_matrix.Parameters: Returns: a sorted rect top-left/bottom-right
Examples:
>>> states_matrix = np.array([ ... #0 1 2 3 4 5 ... [0, 0, 0, 0, 0, 0], #0 ... [0, 0, 1, 1, 1, 0], #1 ... [0, 1, 0, 0, 1, 0], #2 ... [0, 1, 1, 1, 1, 0], #3 ... [0, 0, 0, 0, 0, 1], #4 ... ], dtype=bool) >>> r1, r2 = (Coords(2, 1), Coords(2, 1)) >>> _expand_rect(states_matrix, r1, r2, 'U') (Coords(row=2, col=1), Coords(row=2, col=1)) >>> r1, r2 = (Coords(3, 1), Coords(2, 1)) >>> _expand_rect(states_matrix, r1, r2, 'R') (Coords(row=2, col=1), Coords(row=3, col=4)) >>> r1, r2 = (Coords(2, 1), Coords(6, 1)) >>> _expand_rect(states_matrix, r1, r2, 'r') (Coords(row=2, col=1), Coords(row=6, col=5)) >>> r1, r2 = (Coords(2, 3), Coords(2, 3)) >>> _expand_rect(states_matrix, r1, r2, 'LURD') (Coords(row=1, col=1), Coords(row=3, col=4))
-
pandalone.xleash._capture._extract_states_vector(states_matrix, dn_coords, land, mov)[source]¶ Extract a slice from the states-matrix by starting from
landand followingmov.
-
pandalone.xleash._capture._resolve_cell(cell, up_coords, dn_coords, base_coords=None)[source]¶ Translates any special coords to absolute ones.
To get the margin_coords, use one of:
ABCSheet.get_margin_coords()io.backend.margin_coords_from_states_matrix()
Parameters: Returns: the resolved cell-coords
Return type: Examples:
>>> up = Coords(1, 2) >>> dn = Coords(10, 6) >>> base = Coords(40, 50) >>> _resolve_cell(Cell(col='B', row='5'), up, dn) Coords(row=4, col=1) >>> _resolve_cell(Cell('^', '^'), up, dn) Coords(row=1, col=2) >>> _resolve_cell(Cell('_', '_'), up, dn) Coords(row=10, col=6) >>> base == _resolve_cell(Cell('.', '.'), up, dn, base) True >>> _resolve_cell(Cell('-1', '-2'), up, dn) Coords(row=10, col=5) >>> _resolve_cell(Cell('A', 'B'), up, dn) Traceback (most recent call last): ValueError: invalid cell(Cell(row='A', col='B')) due to: invalid row('A') due to: invalid literal for int() with base 10: 'A'
But notice when base-cell missing:
>>> _resolve_cell(Cell('1', '.'), up, dn) Traceback (most recent call last): ValueError: invalid cell(Cell(row='1', col='.')) due to: Cannot resolve `relative-col` without `base-coord`!
-
pandalone.xleash._capture._resolve_coord(cname, cfunc, coord, up_coord, dn_coord, base_coords=None)[source]¶ Translates special coords or converts Excel string 1-based rows/cols to zero-based, reporting invalids.
Parameters: - cname (str) – the coord-name, one of ‘row’, ‘column’
- cfunc (function) – the function to convert coord
str --> int - str coord (int,) – the “A1” coord to translate
- up_coord (int) – the resolved top or left margin zero-based coordinate
- dn_coord (int) – the resolved bottom or right margin zero-based coordinate
- None base_coords (int,) – the resolved basis for dependent coord, if any
Returns: the resolved coord or
Noneif it were not a special coord.Row examples:
>>> cname = 'row' >>> r0 = _resolve_coord(cname, _row2num, '1', 1, 10) >>> r0 0 >>> r0 == _resolve_coord(cname, _row2num, 1, 1, 10) True >>> _resolve_coord(cname, _row2num, '^', 1, 10) 1 >>> _resolve_coord(cname, _row2num, '_', 1, 10) 10 >>> _resolve_coord(cname, _row2num, '.', 1, 10, 13) 13 >>> _resolve_coord(cname, _row2num, '-3', 0, 10) 8
But notice when base-cell missing:
>>> _resolve_coord(cname, _row2num, '.', 0, 10, base_coords=None) Traceback (most recent call last): ValueError: Cannot resolve `relative-row` without `base-coord`!
Other ROW error-checks:
>>> _resolve_coord(cname, _row2num, '0', 0, 10) Traceback (most recent call last): ValueError: invalid row('0') due to: Uncooked-coord cannot be zero! >>> _resolve_coord(cname, _row2num, 'a', 0, 10) Traceback (most recent call last): ValueError: invalid row('a') due to: invalid literal for int() with base 10: 'a' >>> _resolve_coord(cname, _row2num, None, 0, 10) Traceback (most recent call last): ValueError: invalid row(None) due to: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Column examples:
>>> cname = 'column' >>> _resolve_coord(cname, _col2num, 'A', 1, 10) 0 >>> _resolve_coord(cname, _col2num, 'DADA', 1, 10) 71084 >>> _resolve_coord(cname, _col2num, '.', 1, 10, 13) 13 >>> _resolve_coord(cname, _col2num, '-4', 0, 10) 7
And COLUMN error-checks:
>>> _resolve_coord(cname, _col2num, None, 0, 10) Traceback (most recent call last): ValueError: invalid column(None) due to: int() argument must be a string, a bytes-like object or a number, not 'NoneType' >>> _resolve_coord(cname, _col2num, 0, 0, 10) Traceback (most recent call last): ValueError: invalid column(0) due to: Uncooked-coord cannot be zero!
-
pandalone.xleash._capture._row2num(coord)[source]¶ Resolves special coords or converts Excel 1-based rows to zero-based, reporting invalids.
Parameters: int coord (str,) – excel-row coordinate or one of ^_.Returns: excel row number, >= 0 Return type: int Examples:
>>> row = _row2num('1') >>> row 0 >>> row == _row2num(1) True
Negatives (from bottom) are preserved:
>>> _row2num('-1') -1
Fails ugly:
>>> _row2num('.') Traceback (most recent call last): ValueError: invalid literal for int() with base 10: '.'
-
pandalone.xleash._capture._sort_rect(r1, r2)[source]¶ Sorts rect-vertices in a 2D-array (with vertices in rows).
Example:
>>> _sort_rect((5, 3), (4, 6)) array([[4, 3], [5, 6]])
-
pandalone.xleash._capture._target_opposite(states_matrix, dn_coords, land, moves, edge_name='')[source]¶ Follow moves from
landand stop on the 1st full-cell.Parameters: Returns: the identified target-cell’s coordinates
Return type: Examples:
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ]) >>> args = (states_matrix, Coords(4, 5)) >>> _target_opposite(*(args + (Coords(0, 0), 'DR'))) Coords(row=3, col=2) >>> _target_opposite(*(args + (Coords(0, 0), 'RD'))) Coords(row=2, col=3)
It fails if a non-empty target-cell cannot be found, or it ends-up beyond bounds:
>>> _target_opposite(*(args + (Coords(0, 0), 'D'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No opposite-target found while moving(D) from landing-Coords(row=0, col=0)! >>> _target_opposite(*(args + (Coords(0, 0), 'UR'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No opposite-target found while moving(UR) from landing-Coords(row=0, col=0)!
But notice that the landing-cell maybe outside of bounds:
>>> _target_opposite(*(args + (Coords(3, 10), 'L'))) Coords(row=3, col=5)
-
pandalone.xleash._capture._target_same(states_matrix, dn_coords, land, moves, edge_name='')[source]¶ Scan term:
exteriorrow and column on specifiedmovesand stop on the last full-cell.Parameters: Returns: the identified target-cell’s coordinates
Return type: Examples:
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ]) >>> args = (states_matrix, Coords(4, 5)) >>> _target_same(*(args + (Coords(4, 5), 'U'))) Coords(row=2, col=5) >>> _target_same(*(args + (Coords(4, 5), 'L'))) Coords(row=4, col=2) >>> _target_same(*(args + (Coords(4, 5), 'UL', ))) Coords(row=2, col=2)
It fails if landing is empty or beyond bounds:
>>> _target_same(*(args + (Coords(2, 2), 'DR'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No same-target found while moving(DR) from landing-Coords(row=2, col=2)! >>> _target_same(*(args + (Coords(10, 3), 'U'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No same-target found while moving(U) from landing-Coords(row=10, col=3)!
-
pandalone.xleash._capture._target_same_vector(states_matrix, dn_coords, land, mov)[source]¶ Parameters:
-
pandalone.xleash._capture.coords2Cell(row, col)[source]¶ Make A1
Cellfrom resolved coords, with rudimentary error-checking.Examples:
>>> coords2Cell(row=0, col=0) Cell(row='1', col='A') >>> coords2Cell(row=0, col=26) Cell(row='1', col='AA') >>> coords2Cell(row=10, col='.') Cell(row='11', col='.') >>> coords2Cell(row=-3, col=-2) Traceback (most recent call last): AssertionError: negative row!
-
pandalone.xleash._capture.resolve_capture_rect(states_matrix, up_dn_margins, st_edge, nd_edge=None, exp_moves=None, base_coords=None)[source]¶ Performs targeting, capturing and expansions based on the states-matrix.
To get the margin_coords, use one of:
ABCSheet.get_margin_coords()io.backend.margin_coords_from_states_matrix()
Its results can be fed into
read_capture_values().Parameters: - states_matrix (np.ndarray) – A 2D-array with
Falsewherever cell are blank or empty. UseABCSheet.get_states_matrix()to derrive it. - Coords) up_dn_margins ((Coords,) – the top-left/bottom-right coords with full-cells
- st_edge (Edge) – “uncooked” as matched by regex
- nd_edge (Edge) – “uncooked” as matched by regex
- or none exp_moves (list) – Just the parsed string, and not
None. - base_coords (Coords) – The base for a dependent 1st edge.
Returns: a
(Coords, Coords)with the 1st and 2nd capture-cell ordered from top-left –> bottom-right.Return type: Raises: EmptyCaptureException – When targeting failed, and no target cell identified.
- Examples::
>>> from pandalone.xleash import Edge, margin_coords_from_states_matrix
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ], dtype=bool) >>> up, dn = margin_coords_from_states_matrix(states_matrix)
>>> st_edge = Edge(Cell('1', 'A'), 'DR') >>> nd_edge = Edge(Cell('.', '.'), 'DR') >>> resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) (Coords(row=3, col=2), Coords(row=4, col=2))
Using dependenent coordinates for the 2nd edge:
>>> st_edge = Edge(Cell('_', '_'), None) >>> nd_edge = Edge(Cell('.', '.'), 'UL') >>> rect = resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) >>> rect (Coords(row=2, col=2), Coords(row=4, col=5))
Using sheet’s margins:
>>> st_edge = Edge(Cell('^', '_'), None) >>> nd_edge = Edge(Cell('_', '^'), None) >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
Walking backwards:
>>> st_edge = Edge(Cell('^', '_'), 'L') # Landing is full, so 'L' ignored. >>> nd_edge = Edge(Cell('_', '_'), 'L', '+') # '+' or would also stop. >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
4.1.12. Submodule: pandalone.xleash._filter¶
The high-level functionality, the filtering and recursive lassoing.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash._filter.XLocation(sheet, st, nd, base_coords)¶ Bases:
tupleFields denoting the position of a sheet/cell while running a element-wise-filter.
Practically func:
run_filter_elementwise() preserves these fields if the processed ones were `None.-
__getnewargs__()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__(_cls, sheet, st, nd, base_coords)¶ Create new instance of XLocation(sheet, st, nd, base_coords)
-
__repr__()¶ Return a nicely formatted representation string
-
_asdict()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make(iterable)¶ Make a new XLocation object from a sequence or iterable
-
_replace(**kwds)¶ Return a new XLocation object replacing specified fields with new values
-
base_coords¶ Alias for field number 3
-
nd¶ Alias for field number 2
-
sheet¶ Alias for field number 0
-
st¶ Alias for field number 1
-
-
pandalone.xleash._filter._classify_rect_shape(st, nd)[source]¶ Identifies rect from its edge-coordinates (row, col, 2d-table)..
Parameters: Returns: in int based on the input like that:
- 0: only
stgiven - 1:
standndpoint the same cell - 2: row
- 3: col
- 4: 2d-table
Examples:
>>> _classify_rect_shape((1,1), None) 0 >>> _classify_rect_shape((2,2), (2,2)) 1 >>> _classify_rect_shape((2,2), (2,20)) 2 >>> _classify_rect_shape((2,2), (20,2)) 3 >>> _classify_rect_shape((2,2), (20,20)) 4
- 0: only
-
pandalone.xleash._filter._downdim(values, new_ndim)[source]¶ Squeeze it, and then flatten it, before inflating it.
Parameters: - values – The scalar ot 2D-results of
Sheet.read_rect() - new_dim (int) – The new dimension the result should have
- values – The scalar ot 2D-results of
-
pandalone.xleash._filter._redim(values, new_ndim)[source]¶ Reshapes the capture-rect values of
read_capture_rect().Parameters: - values ((nested) list, *) – The scalar ot 2D-results of
Sheet.read_rect() - new_ndim –
Returns: reshaped values
Return type: list of lists, list, *
Examples:
>>> _redim([1, 2], 2) [[1, 2]] >>> _redim([[1, 2]], 1) [1, 2] >>> _redim([], 2) [[]] >>> _redim([[3.14]], 0) 3.14 >>> _redim([[11, 22]], 0) [11, 22] >>> arr = [[[11], [22]]] >>> arr == _redim(arr, None) True >>> _redim([[11, 22]], 0) [11, 22]
- values ((nested) list, *) – The scalar ot 2D-results of
-
pandalone.xleash._filter._updim(values, new_ndim)[source]¶ Append trivial dimensions to the left.
Parameters: - values – The scalar ot 2D-results of
Sheet.read_rect() - new_dim (int) – The new dimension the result should have
- values – The scalar ot 2D-results of
-
pandalone.xleash._filter.install_default_filters(filters_dict)[source]¶ Updates the default available filters used by
lasso()when constructing its internalRanger.param dict filters_dict: The dictionary to update with the default filters.
-
pandalone.xleash._filter.pipe_filter(ranger, lasso, *filters, **kwds)[source]¶ A bulk-filter that applies all call-specifiers one after another on the capture-rect values.
Parameters: filters (list) – the json-parsed call-spec
-
pandalone.xleash._filter.py_filter(ranger, lasso, expr)[source]¶ A bulk-filter that passes values through a python-expression using
astevallibrary.The
exprmay access read-write alllocals()of this method (ranger,lasso), thenumpyfuncs, and thepandalone.xleashmodule under thexleashvariable.- The
exprmay return either: - the processed values, or
- an instance of the
Lasso, in which case only itsoptfield is checked and replaced with original if missing. So better usenamedtuple._replace()on the currentlassowhich exists in the expr’s namespace.
Parameters: expr (str) – The python-expression, which may comprise of multiple statements. - The
-
pandalone.xleash._filter.pyeval_filter(ranger, lasso, filters=(), eval_all=False, include=None, exclude=None, depth=-1)[source]¶ A element-wise-filter that uses
astevalto evaluate string values as python expressions.The
exprfecthed fromterm:`capturingmay access read-write alllocals()of this method (ie:ranger,lasso), thenumpyfuncs, and thepandalone.xleashmodule under thexleashvariable.- The
exprmay return either: - the processed values, or
- an instance of the
Lasso, in which case only itsoptfield is checked and replaced with original if missing. So better usenamedtuple._replace()on the currentlassowhich exists in the expr’s namespace.
Parameters: - eval_all (bool) – If
Trueraise on 1st error and stop diving cells. Defaults toFalse. - filters (list) – Any filters to apply after invoking the
element_func. - or str include (list) – Items to include when diving into “indexed” values.
See
run_filter_elementwise(). - or str exclude (list) – Items to exclude when diving into “indexed” values.
See
run_filter_elementwise(). - or None depth (int) – How deep to dive into nested structures, “indexed” or lists.
If
< 0, no limit. If 0, stops completely. Seerun_filter_elementwise().
Example:
>>> from pandalone import xleash >>> expr = ''' ... res = array([[0.5, 0.3, 0.1, 0.1]]) ... res * res.T ... ''' >>> lasso = Lasso(values=expr, opts={}) >>> with xleash.SheetsFactory() as sf: ... ranger = xleash.Ranger(sf) ... pyeval_filter(ranger, lasso).values array([[0.25, 0.15, 0.05, 0.05], [0.15, 0.09, 0.03, 0.03], [0.05, 0.03, 0.01, 0.01], [0.05, 0.03, 0.01, 0.01]])
- The
-
pandalone.xleash._filter.recursive_filter(ranger, lasso, filters=(), include=None, exclude=None, depth=-1)[source]¶ A element-wise-filter that expand recursively any xl-ref strings elements in capture-rect values.
Parameters: - filters (list) – Any filters to apply after invoking the
element_func. - or str include (list) – Items to include when diving into “indexed” values.
See
run_filter_elementwise(). - or str exclude (list) – Items to exclude when diving into “indexed” values.
See
run_filter_elementwise(). - or None depth (int) – How deep to dive into nested structures, “indexed” or lists.
If
< 0, no limit. If 0, stops completely. Seerun_filter_elementwise().
- filters (list) – Any filters to apply after invoking the
-
pandalone.xleash._filter.redim_filter(ranger, lasso, scalar=None, cell=None, row=None, col=None, table=None)[source]¶ A bulk-filter that reshapes sand/or transpose captured values, depending on rect’s shape.
Each dimension might be a single int or None, or a pair [dim, transpose].
-
pandalone.xleash._filter.run_filter_elementwise(ranger, lasso, element_func, filters, include=None, exclude=None, depth=-1, *args, **kwds)[source]¶ Runner of all element-wise filters.
It applies the
element_funcon elements extracted fromlasso.valuesby treating the later first as “indexed” objects (Mappings, Series and Dataframes.), and if that fails, as nested lists.The
include/excludefilter args work only for “indexed” objects withitems()and indexing methods.- If no filter arg specified, expands for all keys.
- If only
includespecified, rejects all keys not explicitly contained in this filter arg. - If only
excludespecified, expands all keys not explicitly contained in this filter arg. - When both
include/excludeexist, only those explicitly included are accepted, unless also excluded.
Lower the
logginglevel to see other than syntax-errors on recursion reported onlog.Only those in
XLocationare passed recursively.
Parameters: - element_func (list) –
A function implementing the element-wise filter and returning a 2-tuple
(is_proccessed, new_val_or_lasso), like that:def element_func(ranger, lasso, context, elval) proced = False try: elval = int(elval) proced = True except ValueError: pass return proced, elval
Its
kwdsmay contain theinclude,excludeanddepthargs. Any exception raised fromelement_funcwill cancel the diving. - filters (list) – Any filters to apply after invoking the
element_func. - or str include (list) – Items to include when diving into “indexed” values. See description above.
- or str exclude (list) – Items to exclude when diving into “indexed” values. See description above.
- or None depth (int) – How deep to dive into nested structures, “indexed” or lists.
If
< 0, no limit. If 0, stops completely.
Params args: To be relayed to ‘element_func’.
Params kwds: To be relayed to ‘element_func’.
4.1.13. Submodule: pandalone.xleash._lasso¶
The high-level functionality, the filtering and recursive lassoing.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash._lasso.Ranger(sheets_factory, base_opts=None, available_filters=None)[source]¶ Bases:
objectThe director-class that performs all stages required for “throwing the lasso” around rect-values.
Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.
The
do_lasso()does the job.Variables: - sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
None, butdo_lasso()will scream unless invoked with acontext_lassoarg containing a concreteABCSheet. - base_opts (dict) – The opts that are deep-copied and used as the defaults
for every
do_lasso(), whether invoked directly or recursively byrecursive_filter(). If unspecified, no opts are used, but this attr is set to an empty dict. Seeget_default_opts(). - or None available_filters (dict) – The filters available for a xl-ref to use.
If
None, then usesxleash.installed_filters. Use an empty dict not to use any filters. - intermediate_lasso (Lasso) – A
('stage', Lasso)pair with the lastLassoinstance produced during the last execution of thedo_lasso(). Used for inspecting/debuging.
-
__init__(sheets_factory, base_opts=None, available_filters=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_make_init_Lasso(**context_kwds)[source]¶ Creates the lasso to be used for each new
do_lasso()invocation.
-
_parse_and_merge_with_context(xlref, init_lasso)[source]¶ Merges xl-ref parsed-parsed_fields with
init_lasso, reporting any errors.Parameters: init_lasso (Lasso) – Default values to be overridden by non-nulls. Returns: a Lasso with any non Noneparsed-fields updated
-
_resolve_capture_rect(lasso, sheet)[source]¶ Also handles
EmptyCaptureExceptionin caseopts['no_empty'] != False.
- sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
-
pandalone.xleash._lasso.get_default_opts(overrides=None)[source]¶ Default opts used by
lasso()when constructing its internalRanger.Parameters: or None overrides (dict) – Any items to update the default ones.
-
pandalone.xleash._lasso.lasso(xlref, sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds)[source]¶ High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a
Ranger.Parameters: - xlref (str) –
a string with the xl-ref format:
<url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
i.e.:
file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
the new
SheetsFactorycreated is closed afterwards. Delegated tomake_default_Ranger(), so items override default ones; use a newRangerif that is not desired. - available_filters (dict or None) – Delegated to
make_default_Ranger(), so items override default ones; use a newRangerif that is not desired. - return_lasso (bool) –
If
True, values are contained in the returned Lasso instance, along with all other artifacts of the lassoing procedure.For more debugging help, create a
Rangeyourself and inspect theRanger.intermediate_lasso. - context_kwds (Lasso) – Default
Lassofields in case parsed ones areNone(i.e. you can specify the sheet like that).
Variables: base_opts – Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every
Ranger.do_lasso(), whether invoked directly or recursively byrecursive_filter(). Read the code to be sure what are the available choices. Delegated tomake_default_Ranger(), so items override default ones; use a newRangerif that is not desired.Returns: Either the captured & filtered values or the final
Lasso, depending on thereturn_lassosarg.Example:
sheet = _
- xlref (str) –
-
pandalone.xleash._lasso.make_default_Ranger(sheets_factory=None, base_opts=None, available_filters=None)[source]¶ Makes a defaulted
Ranger.Parameters: - sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
SheetsFactoryis created. Remember to invoke itsSheetsFactory.close()to clear resources from any opened sheets. - base_opts (dict or None) –
Default opts to affect the lassoing, to be merged with defaults; uses
get_default_opts().Read the code to be sure what are the available choices :-(.
- available_filters (dict or None) – The filters available for a xl-ref to use.
(
xleash.installed_filtersused if unspecified).
For instance, to make you own sheets-factory and override options, yoummay do this:
>>> from pandalone import xleash >>> with xleash.SheetsFactory() as sf: ... xleash.make_default_Ranger(sf, base_opts={'lax': True}) <pandalone.xleash._lasso.Ranger object at ...
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
4.2. Module: pandalone.mappings¶
Hierarchical string-like objects useful for indexing, that can be rename/relocated at a later stage.
Pstep |
Automagically-constructed relocatable paths for accessing data-tree. |
pmods_from_tuples(pmods_tuples) |
Turns a list of 2-tuples into a pmods hierarchy. |
Pmod([_alias, _steps, _regxs]) |
A path-step mapping forming the pmods-hierarchy. |
Example:
>>> from pandalone.mappings import pmods_from_tuples
>>> pmods = pmods_from_tuples([
... ('', 'deeper/ROOT'),
... ('/abc', 'ABC'),
... ('/abc/foo', 'BAR'),
... ])
>>> p = pmods.step()
>>> p.abc.foo
`BAR`
>>> p._paths()
['deeper/ROOT/ABC/BAR']
- TODO: Implements “anywhere” pmods(
//).
-
class
pandalone.mappings.Pmod(_alias=None, _steps={}, _regxs={})[source]¶ Bases:
objectA path-step mapping forming the pmods-hierarchy.
The pmods denotes the hierarchy of all mappings, that either rename or relocate path-steps.
A single mapping transforms an “origin” path to a “destination” one (also called as “from” and “to” paths).
A mapping always transforms the final path-step, like that:
FROM_PATH TO_PATH RESULT_PATH --------- ------- ----------- /rename/path foo --> /rename/foo ## renaming /relocate/path foo/bar --> /relocate/foo/bar ## relocation '' a/b/c --> /a/b/c ## Relocate all paths. / a/b/c --> /a/b/c ## Relocates 1st "empty-str" step.
The pmod is the mapping of that single path-step.
It is possible to match fully on path-steps using regular-expressions, and then to use any captured-groups from the final step into the mapped value:
(/all(.*)/path, foo) + all_1/path --> /all_1/foo + all_XYZ --> /all_XYZ ## no change (/all(.*)/path, foo\1) + all_1/path --> /all_1/foo_1
If more than one regex match, they are merged in the order declared (the latest one overrides a previous one).
Any exact child-name matches are applied and merged after regexs.
Use
pmods_from_tuples()to construct the pmods-hierarchy.The pmods are used internally by class:
Pstepto correspond the component-paths of their input & output onto the actual value-tree paths.
Example:
Note
Do not manually construct instances from this class! To construct a hierarchy use the
pmods_from_tuples()or pass mappings as the 2nd argument inPstepconstructor.You can either use it for massively map paths, either for renaming them:
>>> pmods = pmods_from_tuples([ ... ('/a', 'A'), ... ('/~b.*', r'BB\g<0>'), ## Previous match. ... ('/~b.*/~c.(.*)', r'W\1ER'), ## Capturing-group(1) ... ]) >>> pmods.map_paths(['/a', '/a/foo']) ## 1st rule ['/A', '/A/foo'] >>> pmods.map_path('/big/stuff') ## 2nd rule '/BBbig/stuff' >>> pmods.map_path('/born/child') ## 2nd & 3rd rule '/BBborn/WildER'
or to relocate them:
>>> pmods = pmods_from_tuples([ ... ('/a', 'A/AA'), ... ('/~b.*/~c(.*)', r'../C/\1'), ... ('/~b.*/~.*/~r.*', r'/\g<0>'), ... ]) >>> pmods.map_paths(['/a/foo', '/big/child', '/begin/from/root']) ['/A/AA/foo', '/big/C/hild', '/root']
Here is how you relocate “root” (notice that the
''path is the root):>>> pmods = pmods_from_tuples([('', '/NEW/ROOT')]) >>> pmods.map_paths(['/a/foo', '']) ['/NEW/ROOT/a/foo', '/NEW/ROOT']
-
__init__(_alias=None, _steps={}, _regxs={})[source]¶ Args passed only for testing, remember
_regxsto be (k,v) tuple-list!Note
Volatile arg-defaults (empty dicts) are knowingly used , to preserve memory; should never append in them!
-
_alias¶ (optional) the mapped-name of the pstep for
-
_append_into_regxs(key)[source]¶ Inserts a child-mappings into
_stepsdict.Parameters: key (str) – the regex-pattern to add
-
_append_into_steps(key)[source]¶ Inserts a child-mappings into
_stepsdict.Parameters: key (str) – the step-name to add
-
_merge(other)[source]¶ Clone and override all its props with props from other-pmod, recursively.
Although it does not modify this, the
otheror their children pmods, it may “share” (crosslink) them, so pmods MUST NOT be modified later.Parameters: other (Pmod) – contains the dicts with the overrides Returns: the cloned merged pmod Return type: Pmod Examples:
Look how
_stepsare merged:>>> pm1 = Pmod(_alias='pm1', _steps={ ... 'a':Pmod(_alias='A'), 'c':Pmod(_alias='C')}) >>> pm2 = Pmod(_alias='pm2', _steps={ ... 'b':Pmod(_alias='B'), 'a':Pmod(_alias='AA')}) >>> pm = pm1._merge(pm2) >>> sorted(pm._steps.keys()) ['a', 'b', 'c']
And here it is
_regxsmerging, which preserves order:>>> pm1 = Pmod(_alias='pm1', ... _regxs=[('d', Pmod(_alias='D')), ... ('a', Pmod(_alias='A')), ... ('c', Pmod(_alias='C'))]) >>> pm2 = Pmod(_alias='pm2', ... _regxs=[('b', Pmod(_alias='BB')), ... ('a', Pmod(_alias='AA'))]) >>> pm1._merge(pm2) pmod('pm2', {re.compile('d'): pmod('D'), re.compile('c'): pmod('C'), re.compile('b'): pmod('BB'), re.compile('a'): pmod('AA')}) >>> pm2._merge(pm1) pmod('pm1', {re.compile('b'): pmod('BB'), re.compile('d'): pmod('D'), re.compile('a'): pmod('A'), re.compile('c'): pmod('C')})
-
_override_regxs(other)[source]¶ Override this pmod’s
_regxsdict with other’s, recursively.- It may “share” (crosslink) the dict and/or its child-pmods
between the two pmod args (
selfandother). - No dict is modified (apart from self, which must have been cloned
previously by
Pmod._merge()), to avoid side-effects in case they were “shared”. - It preserves dict-ordering so that
otherorder takes precedence (its elements are the last ones).
Parameters: - It may “share” (crosslink) the dict and/or its child-pmods
between the two pmod args (
-
_override_steps(other)[source]¶ Override this pmod’s ‘_steps’ dict with other’s, recursively.
Same as
_override_regxs()but without caring for order.
-
_regxs¶ {regex_on_originals –> pmod}
-
_steps¶ {original_name –> pmod}
-
alias(cstep)[source]¶ Like
descend()but without merging child-pmods.Returns: the expanded alias from child/regexs or None
-
descend(cstep)[source]¶ Return child-pmod with merged any exact child with all matched regexps, along with its alias regex-expaned.
Parameters: cstep (str) – the child path-step cstep of the pmod to return Returns: the merged-child pmod, along with the alias; both might be None, if nothing matched, or no alias. Return type: tuple(Pmod, str) Example:
>>> pm = Pmod( ... _steps={'a': Pmod(_alias='A')}, ... _regxs=[(r'a\w*', Pmod(_alias='AWord')), ... (r'a(\d*)', Pmod(_alias=r'A_\1')), ... ]) >>> pm.descend('a') (pmod('A'), 'A') >>> pm.descend('abc') (pmod('AWord'), 'AWord') >>> pm.descend('a12') (pmod('A_\\1'), 'A_12') >>> pm.descend('BAD') (None, None)
Notice how children of regexps are merged together:
>>> pm = Pmod( ... _steps={'a': ... Pmod(_alias='A', _steps={1: 11})}, ... _regxs=[ ... (r'a\w*', Pmod(_alias='AWord', ... _steps={2: Pmod(_alias=22)})), ... (r'a\d*', Pmod(_alias='ADigit', ... _steps={3: Pmod(_alias=33)})), ... ]) >>> sorted(pm.descend('a')[0]._steps) ## All children and regexps match. [1, 2, 3] >>> pm.descend('aa')[0]._steps ## Only r'a\w*' matches. {2: pmod(22)} >>> sorted(pm.descend('a1')[0]._steps ) ## Both regexps matches. [2, 3]
So it is possible to say:
>>> pm.descend('a1')[0].alias(2) 22 >>> pm.descend('a1')[0].alias(3) 33 >>> pm.descend('a1')[0].descend('BAD') (None, None) >>> pm.descend('a$') (None, None)
but it is better to use
map_path()for this.
-
map_path(path)[source]¶ Maps a ‘/rooted/path’ using all aliases while descending its child pmods.
It uses any aliases on all child pmods if found.
Parameters: path (str) – a rooted path to transform Returns: the rooted mapped path or ‘/’ if path was ‘/’ Return type: str or None Examples:
>>> pmods = pmods_from_tuples([ ... ('/a', 'A/AA'), ... ('/~a(\\w*)', r'BB\1'), ... ('/~a\\w*/~d.*', r'D \g<0>'), ... ('/~a(\\d+)', r'C/\1'), ... ('/~a(\\d+)/~(c.*)', r'CC-/\1'), # The 1st group is ignored! ... ('/~a\\d+/~e.*', r'/newroot/\g<0>'), # Rooted mapping. ... ]) >>> pmods.map_path('/a') '/A/AA' >>> pmods.map_path('/a_hi') '/BB_hi' >>> pmods.map_path('/a12') '/C/12' >>> pmods.map_path('/a12/etc') '/newroot/etc'
Notice how children from all matching prior-steps are merged:
>>> pmods.map_path('/a12/dow') '/C/12/D dow' >>> pmods.map_path('/a12/cow') '/C/12/CC-/cow'
To map root use ‘’ which matches before the 1st slash(‘/’):
>>> pmods = pmods_from_tuples([('', 'New/Root'),]) ## Relative >>> pmods pmod({'': pmod('New/Root')}) >>> pmods.map_path('/for/plant') 'New/Root/for/plant' >>> pmods_from_tuples([('', '/New/Root'),]).map_path('/for/plant') '/New/Root/for/plant'
Note
Using slash(‘/’) for “from” path will NOT map root:
>>> pmods = pmods_from_tuples([('/', 'New/Root'),]) >>> pmods pmod({'': pmod({'': pmod('New/Root')})}) >>> pmods.map_path('/for/plant') '/for/plant' >>> pmods.map_path('//for/plant') '/New/Root/for/plant' '/root'
but ‘’ always remains unchanged (whole document):
>>> pmods.map_path('') ''
-
step(pname='', alias=None)[source]¶ Create a new
Pstephaving as mappings this pmod.If no
pnamespecified, creates a root pstep.Delegates to
Pstep.__new__().
-
class
pandalone.mappings.Pstep[source]¶ Bases:
strAutomagically-constructed relocatable paths for accessing data-tree.
The “magic” autocreates psteps as they referenced, making writing code that access data-tree paths, natural, while at the same time the “model” of those tree-data gets discovered.
Each pstep keeps internally the name of a data-tree step, which, when created through recursive referencing, concedes with parent’s branch leading to this step. That name can be modified with
Pmodso the same data-accessing code can refer to differently-named values int the data-tree.Variables: - _csteps (dict) – the child-psteps by their name (default
None) - _pmod (dict) – path-modifications used to construct this and
relayed to children (default
None) - _locked (int) – one of
-
Pstep.CAN_RELOCATE(default), -Pstep.CAN_RENAME, -Pstep.LOCKED(neither from the above). - _tags (set) – A set of strings (default
()) - _schema (dict) – json-schema data.
See
__new__()for interal constructor.Usage:
Use a
Pmod.pstep()to construct a root pstep from mappings. Specify a string argument to construct a relative pstep-hierarchy.Just referencing (non_private) attributes, creates them.
Private attributes and functions (starting with
_) exist for specific operations (ie for specifying json-schema, or for collection all paths).Assignments are only allowed for string-values, or to private attributes:
>>> p = Pstep() >>> p.assignments = 12 Traceback (most recent call last): AssertionError: Cannot assign '12' to '/assignments! >>> p._but_hidden = 'Ok'
Use
_paths()to get all defined paths so far.Construction:
>>> Pstep() `` >>> Pstep('a') `a`
Notice that pstesps are surrounded with the back-tick char(‘`’).
Paths are created implicitely as they are referenced:
>>> m = {'a': 1, 'abc': 2, 'cc': 33} >>> p = Pstep('a') >>> assert m[p] == 1 >>> assert m[p.abc] == 2 >>> assert m[p.a321.cc] == 33 >>> sorted(p._paths()) ['a/a321/cc', 'a/abc']
Any “path-mappings” or “pmods” maybe specified during construction:
>>> from pandalone.mappings import pmods_from_tuples >>> maps = [ ... ('', 'deeper/ROOT'), ... ('/abc', 'ABC'), ... ('/abc/foo', 'BAR'), ... ] >>> p = Pstep('', pmods_from_tuples(maps))
OR
>>> pmods = pmods_from_tuples(maps) >>> p = pmods.step() >>> p.abc.foo `BAR` >>> p._paths() ['deeper/ROOT/ABC/BAR']
but exceptions are thrown if mapping any step marked as “locked”:
>>> p.abc.foo._locked ## 3: CAN_RELOCATE 3
>>> p.abc.foo._lock ## Screams, because `foo` is already mapped. Traceback (most recent call last): ValueError: Cannot rename/relocate 'foo'-->'BAR' due to LOCKED!
Warning
Creating an empty(
'') step in some paths will “root” the path:>>> p = Pstep() >>> _ = p.a1.b >>> _ = p.A2 >>> p._paths() ['/A2', '/a1/b'] >>> _ = p.a1.a2.c >>> _ = p.a1.a2 = '' >>> p._paths() ['/A2', '/a1/b', '/c']
-
static
__new__(cls, pname=None, maps=None, alias=None, *tags)[source]¶ Constructs a string with str-content which may comes from the mappings.
These are the valid argument combinations:
pname='attr_name', pname='attr_name', _alias='Mass [kg]' pname='attr_name', maps=Pmod pname='attr_name', maps=Pstep pname='attr_name', maps=Pstep, _alias='Mass [kg]'
Parameters: - pname (str) – this pstep’s name which must coincede with the name of
the parent-pstep’s attribute holding this pstep.
It is stored at
_origand if noaliasand unmapped by pmod, this becomes thealias. To create an “absolute” pstep, do not set this or alias args. - or Pstep maps (Pmod) –
It can be either:
- the mappings for this pstep,
- another pstep to clone attributes from (used when replacing an existing child-pstep), or
- None.
The mappings will apply only if
Pmod.descend()matchpnameand will derrive the alias. - alias (str) – Will become the super-str object when no mappings specified
(
mapsis a dict from some prototype pstep) It gets jsonpointer-escaped if it exists (seepandata.escape_jsonpointer_part()) - tags – Arguments for calling
_tag()afterwards.
- pname (str) – this pstep’s name which must coincede with the name of
the parent-pstep’s attribute holding this pstep.
It is stored at
-
_derrive_map_tuples()[source]¶ Recursively extract
(cmap --> alias)pairs from the pstep-hierarchy.Parameters: Return type:
-
_fix¶ Sets
locked=CAN_RENAME. :return: self :raise: ValueError if step has been relocated pstep
-
_iter_hierarchy(prefix_steps=())[source]¶ Breadth-first traversing of pstep-hierarchy.
Parameters: prefix_steps (tuple) – Builds here branch currently visiting. Returns: yields the visited pstep along with its path (including it) Return type: (Pstep, [Pstep])
-
_lock¶ Set
locked=LOCKED. :return: self, for chained use :raise: ValueError if step has been renamed/relocated pstep
-
_locked¶ Gets
_lockedinternal flag or scream on set, when step already renamed/relocatedPrefer using one of
_fixor_lockinstead.Parameters: locked – One of CAN_RELOCATE,CAN_RENAME,LOCKED.Raise: ValueError when stricter lock-value on a renamed/relocated pstep
-
_paths(with_orig=False, tag=None)[source]¶ Return all children-paths (str-list) constructed so far, in a list.
Parameters: Return type: [str]
Examples:
>>> p = Pstep() >>> _ = p.a1._tag('inp').b._tag('inp').c >>> _ = p.a2.b2 >>> p._paths() ['/a1/b/c', '/a2/b2'] >>> p._paths(tag='inp') ['/a1', '/a1/b']
For debugging set
with_origtoTrue:>>> pmods = pmods_from_tuples([ ... ('', 'ROOT'), ... ('/a', 'A/AA'), ... ]) >>> p = pmods.step() >>> _ = p.a.b >>> p._paths(with_orig=True) ['(-->ROOT)/(a-->A/AA)/b']
-
_schema¶ Updates json-schema-v4 on this pstep (see
JSchema).
- _csteps (dict) – the child-psteps by their name (default
-
pandalone.mappings._append_step(steps, step)[source]¶ Joins
stepat the right ofsteps, respecting ‘/’, ‘..’, ‘.’, ‘’.Parameters: Return type: Note
The empty-string(‘’) is the “root” for both
stepsandstep. An empty-tuplestepsis considered “relative”, equivalent to dot().Example:
>>> _append_step((), 'a') ('a',) >>> _append_step(('a', 'b'), '..') ('a',) >>> _append_step(('a', 'b'), '.') ('a', 'b')
Not that an “absolute” path has the 1st-step empty(
''), (so the previous paths above were all “relative”):>>> _append_step(('a', 'b'), '') ('',) >>> _append_step(('',), '') ('',) >>> _append_step((), '') ('',)
Dot-dots preserve “relative” and “absolute” paths, respectively, and hence do not coalesce when at the left:
>>> _append_step(('',), '..') ('',) >>> _append_step(('',), '.') ('',) >>> _append_step(('a',), '..') () >>> _append_step((), '..') ('..',) >>> _append_step(('..',), '..') ('..', '..') >>> _append_step((), '.') ()
Single-dots(‘.’) just dissappear:
>>> _append_step(('.',), '.') () >>> _append_step(('.',), '..') ('..',)
-
pandalone.mappings._clone_attrs(obj)[source]¶ Clone deeply any collection attributes of the passed-in object.
-
pandalone.mappings._forbidden_pstep_attrs= ('get_values', 'Series')¶ Psteps attributes excluded from magic-creation, because searched by pandas’s indexing code.
-
pandalone.mappings._join_paths(*steps)[source]¶ Joins all path-steps in a single string, respecting
'/', '..', '.', ''.Parameters: steps (str) – single json-steps, from left to right Return type: str Note
If you use
iter_jsonpointer_parts_relaxed()to generate path-steps, the “root” is signified by the empty('') step; not the slash(/)!Hence a lone slash(
/) gets splitted to an empty step after “root” like that:('', ''), which generates just “root”('').Therefore a “folder” (i.e.
some/folder/) when splitted equals('some', 'folder', ''), which results again in the “root”('')!Examples:
>>> _join_paths('r', 'a', 'b') 'r/a/b' >>> _join_paths('', 'a', 'b', '..', 'bb', 'cc') '/a/bb/cc' >>> _join_paths('a', 'b', '.', 'c') 'a/b/c'
An empty-step “roots” the remaining path-steps:
>>> _join_paths('a', 'b', '', 'r', 'aa', 'bb') '/r/aa/bb'
All
stepshave to be already “splitted”:>>> _join_paths('a', 'b', '../bb') 'a/b/../bb'
Dot-doting preserves “relative” and “absolute” paths, respectively:
>>> _join_paths('..') '..' >>> _join_paths('a', '..') '.' >>> _join_paths('a', '..', '..', '..') '../..' >>> _join_paths('', 'a', '..', '..') ''
Some more special cases:
>>> _join_paths('..', 'a') '../a' >>> _join_paths('', '.', '..', '..') '' >>> _join_paths('.', '..') '..' >>> _join_paths('..', '.', '..') '../..'
See also
_append_step
-
pandalone.mappings.pmods_from_tuples(pmods_tuples)[source]¶ Turns a list of 2-tuples into a pmods hierarchy.
Each tuple defines the renaming-or-relocation of the final part of some component path onto another one into value-trees, such as:
(/rename/path, foo) --> rename/foo (relocate/path, foo/bar) --> relocate/foo/bar
The “from” path may be: - relative, - absolute(starting with
/), or - “anywhere”(starting with//).In case a “step” in the “from” path starts with tilda(
), it is assumed to be a regular-expression, and it is removed from it. The “to” path can make use of any “from” capture-groups:('/~all(.*)/path', 'foo') (r'~some[\d+]/path', 'foo\1') ('//~all(.*)/path', 'foo')
Parameters: str) pmods_tuples (list(tuple(str,) – Returns: a root pmod Return type: Pmod Example:
>>> pmods_from_tuples([ ... ('/a', 'A1/A2'), ... ('/a/b', 'B'), ... ]) pmod({'': pmod({'a': pmod('A1/A2', {'b': pmod('B')})})}) >>> pmods_from_tuples([ ... ('/~a*', 'A1/A2'), ... ('/a/~b[123]', 'B'), ... ]) pmod({'': pmod({'a': pmod({re.compile('b[123]'): pmod('B')})}, {re.compile('a*'): pmod('A1/A2')})})
This is how you map root:
>>> pmods = pmods_from_tuples([ ... ('', 'relative/Root'), ## Make all paths relatives. ... ('/a/b', '/Rooted/B'), ## But map `b` would be "rooted". ... ]) >>> pmods pmod({'': pmod('relative/Root', {'a': pmod({'b': pmod('/Rooted/B')})})}) >>> pmods.map_path('/a/c') 'relative/Root/a/c' >>> pmods.map_path('/a/b') '/Rooted/B'
But note that ‘/’ maps the 1st “empty-str” step after root:
>>> pmods_from_tuples([ ... ('/', 'New/Root'), ... ]) pmod({'': pmod({'': pmod('New/Root')})})
TODO: Implement “anywhere” matches.
-
pandalone.mappings.pstep_from_df(columns_df, name_col='names')[source]¶ Creates a
Pstepinstances from a dataframe.Parameters: columns_df (pd.DataFrame) – pstep’s mapped-names in
name_colcolumn, indexed by paths, and any additional pstep-attributes in the rest columns.example:
======== ========= =================== paths names renames ======== ========= =================== /A foo ['FOO', 'LL'] /B bar [] ======== ========= ===================
4.3. Module: pandalone.components¶
Defines the building-blocks of a “model”:
- components and assemblies:
- See
Component,FuncComponentandAssembly. - paths and path-mappings (pmods):
- See
Pmod,pmods_from_tuples()andPstep.
4.3.1. TODO¶
- Assembly use ComponentLoader collecting components with:
gatattr()andfilter_predicatedefault toattr.__name__.startswith('cfunc_').- enforce a
disableflag on them.
- Component/assembly should have a stackable or common cwd?
- Components should be easy to run without “framework”.
-
_build()–>run()- pmods on init ORrun()? - As ContextManager? - Imply a default Assembly.
-
class
pandalone.components.Assembly(components, name=None)[source]¶ Bases:
pandalone.components.ComponentExample:
>>> def cfunc_f1(comp, value_tree): ... comp.pinp().A ... comp.pout().B >>> def cfunc_f2(comp, value_tree): ... comp.pinp().B ... comp.pout().C >>> ass = Assembly(FuncComponent(cfunc) for cfunc in [cfunc_f1, cfunc_f2]) >>> ass._build() >>> assert list(ass._iter_validations()) == [] >>> ass._inp ['f1/A', 'f2/B'] >>> ass._out ['f1/B', 'f2/C']
>>> from pandalone.mappings import pmods_from_tuples
>>> pmod = pmods_from_tuples([ ... ('~.*', '/root'), ... ]) >>> ass._build(pmod) >>> sorted(ass._inp + ass._out) ['/root/A', '/root/B', '/root/B', '/root/C']
-
class
pandalone.components.Component(name)[source]¶ Bases:
objectEncapsulates a function and its its inputs/outputs dependencies.
It should be callable, and when executed it may read/modify the data-tree given as its 1st input.
An opportunity to fix the internal-state (i.e. inputs/output/name) is when the
_build()is invoked.Variables: Mostly defined through cfuncs, which provide for defining a component with a single function with a special signature, see
FuncComponent.-
__metaclass__¶ alias of
abc.ABCMeta
-
-
class
pandalone.components.FuncComponent(cfunc, name=None)[source]¶ Bases:
pandalone.components.ComponentConverts a “cfunc” into a component.
A cfunc is a function that modifies the values-tree with this signature:
cfunc_XXXX(comp, vtree)
where:
- comp:
- the
FuncComponentassociated with the cfunc - vtree:
- the part of the data-tree involving the values to be modified by the cfunc
It works also as a utility to developers of a cfuncs, since it is passed as their 1st arg.
The cfuncs may use
pinp()andpout()when accessing its input and output data-tree values respectively. Note that accessing any of those attributes from outside of cfunc, would result in an error.If a cfunc access additional values with “fixed’ paths, then it has to manually add those paths into the
_inpand_outlists.Example:
This would be a fully “relocatable” cfunc:
>>> def cfunc_calc_foobar_rate(comp, value_tree): ... pi = comp.pinp() ... po = comp.pout() ... ... df = value_tree.get(pi) ... ... df[po.Acc] = df[pi.V] / df[pi.T]
To get the unmodified component-paths, use:
>>> comp = FuncComponent(cfunc_calc_foobar_rate) >>> comp._build() >>> assert list(comp._iter_validations()) == [] >>> sorted(comp._inp + comp._out) ['calc_foobar_rate/Acc', 'calc_foobar_rate/T', 'calc_foobar_rate/V']
To get the path-modified component-paths, use:
>>> from pandalone.mappings import pmods_from_tuples >>> pmods = pmods_from_tuples([ ... ('~.*', '/A/B'), ... ]) >>> comp._build(pmods) >>> sorted(comp.pinp()._paths()) ['/A/B/T', '/A/B/V'] >>> comp.pout()._paths() ['/A/B/Acc'] >>> sorted(comp._inp + comp._out) ['/A/B/Acc', '/A/B/T', '/A/B/V'] >>> comp._build(pmods) >>> sorted(comp._inp + comp._out) ['/A/B/Acc', '/A/B/T', '/A/B/V']
4.4. Module: pandalone.pandata¶
A pandas-model is a tree of strings, numbers, sequences, dicts, pandas instances and resolvable
URI-references, implemented by Pandel.
-
class
pandalone.pandata.JSONCodec[source]¶ Bases:
objectJson coders/decoders capable for (almost) all python objects, by pickling them.
Example:
>>> import json >>> obj_list = [ ... 3.14, ... { ... 'aa': pd.DataFrame([]), ... 2: np.array([]), ... 33: {'foo': 'bar'}, ... }, ... pd.DataFrame(np.random.randn(10, 2)), ... ('b', pd.Series({})), ... ] >>> for o in obj_list + [obj_list]: ... s = json.dumps(o, cls=JSONCodec.Encoder) ... oo = json.loads(s, cls=JSONCodec.Decoder) ... #assert trees_equal(o, oo) ...
See also
For pickle-limitations: https://docs.python.org/3.7/library/pickle.html#pickle-picklable
-
class
Decoder(*, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)[source]¶ Bases:
json.decoder.JSONDecoder
-
class
-
class
pandalone.pandata.JSchema[source]¶ Bases:
objectFacilitates the construction of json-schema-v4 nodes on
PStepcode.It does just rudimentary args-name check. Further validations should apply using a proper json-schema validator.
Parameters: - type – if omitted, derived as ‘object’ if it has children
- kws – for all the rest see http://json-schema.org/latest/json-schema-validation.html
-
class
pandalone.pandata.ModelOperations[source]¶ Bases:
pandalone.pandata.ModelOperationsCustomization functions for traversing, I/O, and converting self-or-descendant branch (sub)model values.
-
static
__new__(cls, inp=None, out=None, conv=None)[source]¶ Parameters: - inp (list) – the
args-listtoPandel._read_branch() - out –
The args to
Pandel._write_branch(), that may be specified either as:- an
args-list, that will apply for all model data-types (lists, dicts & pandas), - a map of
type–>args-list, where theNonekey is the catch-all case, - a function returning the
args-listfor some branch-value, with signature:def get_write_branch_args(branch).
- an
- conv –
The conversion-functions (convertors) for the various model’s data-types. The convertors have signature
def convert(branch), and they may be specified either as:- a map of
(from_type, to_type)–>conversion_func(), where theNonekey is the catch-all case, - a “master-switch” function returning the appropriate convertor
depending on the requested conversion.
The master-function’s signature is
def get_convertor(from_branch, to_branch).
The minimum convertors demanded by
Pandelare (at least, check the code for more):- DataFrame <–> dict
- Series <–> dict
- ndarray <–> list
- a map of
- inp (list) – the
-
static
-
class
pandalone.pandata.Pandel(curate_funcs=())[source]¶ Bases:
objectBuilds, validates and stores a pandas-model, a mergeable stack of JSON-schema abiding trees of strings and numbers, assembled with
- sequences,
- dictionaries,
pandas.DataFrame,pandas.Series, and- URI-references to other model-trees.
Overview
The making of a model involves, among others, schema-validating, reading subtree-branches from URIs, cloning, converting and merging multiple sub-models in a single unified-model tree, without side-effecting given input. All these happen in 4+1 steps:
....................... Model Construction ................. ------------ : _______ ___________ : / top_model /==>|Resolve|->|PreValidate|-+ : -----------' : |___0___| |_____1_____| | : ------------ : _______ ___________ | _____ ________ ______ : -------- / base-model/==>|Resolve|->|PreValidate|-+->|Merge|->|Validate|->|Curate|==>/ model / -----------' : |___0___| |_____1_____| |_ 2__| |___3____| |__4+__|: -------' ............................................................
All steps are executed “lazily” using generators (with
yield). Before proceeding to the next step, the previous one must have completed successfully. That way, any ad-hoc code in building-step-5(curation), for instance, will not suffer a horrible death due to badly-formed data.[TODO] The storing of a model simply involves distributing model parts into different files and/or formats, again without side-effecting the unified-model.
Building model
Here is a detailed description of each building-step:
_resolve()and substitute any json-references present in the submodels with content-fragments fetched from the referred URIs. The submodels are cloned first, to avoid side-effecting them.Although by default a combination of JSON and CSV files is expected, this can be customized, either by the content in the json-ref, within the model (see below), or as explained below.
The extended json-refs syntax supported provides for passing arguments into
_read_branch()and_write_branch()methods. The syntax is easier to explain by showing what the default_global_cntxtcorresponds to, for aDataFrame:{ "$ref": "http://example.com/example.json#/foo/bar", "$inp": ["AUTO"], "$out": ["CSV", "encoding=UTF-8"] }
And here what is required to read and (later) store into a HDF5 local file with a predefined name:
{ "$ref": "file://./filename.hdf5", "$inp": ["AUTO"], "$out": ["HDF5"] }
Warning
Step NOT IMPLEMENTED YET!
Loosely
_prevalidate()each sub-model separately with json-schema, where any pandas-instances (DataFrames and Series) are left as is. It is the duty of the developer to ensure that the prevalidation-schema is loose enough that it allows for various submodel-forms, prior to merging, to pass.Recursively clone and
_merge()sub-models in a single unified-model tree. Branches from sub-models higher in the stack override the respective ones from the sub-models below, recursively. Different object types need to be converted appropriately (ie. merging adictwith aDataFrameresults into aDataFrame, so the dictionary has to convert to dataframe).The required conversions into pandas classes can be customized as explained below. Series and DataFrames cannot merge together, and Sequences do not merge with any other object-type (themselfs included), they just “overwrite”.
The default convertor-functions defined both for submodels and models are listed in the following table:
From: To: Method: dict DataFrame pd.DataFrame(the constructor)DataFrame dict lambda df: df.to_dict('list')dict Series pd.Series(the constructor)Series dict lambda sr: sr.to_dict()Strictly json-
_validate()the unified-model (ie enforcingrequiredschema-rules).The required conversions from pandas classes can be customized as explained below.
The default convertor-functions are the same as above.
(Optionally) Apply the
_curate()functions on the the model to enforce dependencies and/or any ad-hoc generation-rules among the data. You can think of bash-like expansion patterns, like${/some/path:=$HOME}or expressions like%len(../other/path).
Storing model
When storing model-parts, if unspecified, the filenames to write into will be deduced from the jsonpointer-path of the
$out’s parent, by substituting “strange” chars with undescores(_).Warning
Functionality NOT IMPLEMENTED YET!
Customization
Some operations within steps (namely conversion and IO) can be customized by the following means (from lower to higher precedance):
The global-default
ModelOperationsinstance on the_global_cntxt, applied on both submodels and unified-model.For example to channel the whole reading/writing of models through HDF5 data-format, it would suffice to modify the
_global_cntxtlike that:pm = FooPandelModel() ## some concrete model-maker io_args = ["HDF5"] pm.mod_global_operations(inp=io_args, out=io_args)
[TODO] Extra-properties on the json-schema applied on both submodels and unified-model for the specific path defined. The supported properties are the non-functional properties of
ModelOperations.
- Specific-properties regarding IO operations within each submodel - see the resolve building-step, above.
Context-maps of
json_paths–>ModelOperationsinstances, installed byadd_submodel()andunified_contextson the model-maker. They apply to self-or-descedant subtree of each model.The
json_pathis a strings obeying a simplified json-pointer syntax (no char-normalizations yet), ie/some/foo/1/pointer. An empty-string('') matches all model.When multiple convertors match for a model-value, the selected convertor to be used is the most specific one (the one with longest prefix). For instance, on the model:
[ { "foo": { "bar": 0 } } ]
all of the following would match the
0value:- the global-default
_global_cntxt, /, and/0/foo
but only the last’s context-props will be applied.
- the global-default
Atributes
-
model¶ The model-tree that will receive the merged submodels after
build()has been invoked. Depending on the submodels, the top-value can be any of the supported model data-types.
-
_submodel_tuples¶ The stack of (
submodel,path_ops) tuples. The list’s 1st element is the base-model, the last one, the top-model. Use theadd_submodel()to build this list.
-
_global_cntxt¶ A
ModelOperationsinstance acting as the global-default context for the unified-model and all submodels. Usemod_global_operations()to modify it.
-
_curate_funcs¶ The sequence of curate functions to be executed as the final step by
_curate(). They are “normal” functions (not generators) with signature:def curate_func(model_maker): pass ## ie: modify ``model_maker.model``.
Better specify this list of functions on construction time.
-
_errored¶ An internal boolean flag that becomes
Trueif any build-step has failed, to halt proceeding to the next one. It isNoneif build has not started yet.
Examples
The basic usage requires to subclass your own model-maker, just so that a json-schema is provided for both validation-steps, 2 & 4:
>>> class MyModel(Pandel): ... def _get_json_schema(self, is_prevalidation): ... return { ## Define the json-schema. ... '$schema': 'http://json-schema.org/draft-04/schema#', ... 'required': [] if is_prevalidation else ['a', 'b'], ## Prevalidation is more loose. ... 'properties': { ... 'a': {'type': 'string'}, ... 'b': {'type': 'number'}, ... 'c': {'type': 'number'}, ... } ... }
Then you can instanciate it and add your submodels:
>>> mm = MyModel() >>> mm.add_submodel({"a": 'foo', "b": 1}) ## submodel-1 (base) >>> mm.add_submodel(pd.Series({"a": "bar", "c": 2})) ## submodel-2 (top-model)
You then have to build the final unified-model (any validation errors would be reported at this point):
>>> mdl = mm.build()
Note that you can also access the unified-model in the
modelattribute. You can now interrogate it:>>> mdl['a'] == 'bar' ## Value overridden by top-model True >>> mdl['b'] == 1 ## Value left intact from base-model True >>> mdl['c'] == 2 ## New value from top-model True
Lets try to build with invalid submodels:
>>> mm = MyModel() >>> mm.add_submodel({'a': 1}) ## According to the schema, this should have been a string, >>> mm.add_submodel({'b': 'string'}) ## and this one, a number.
>>> sorted(mm.build_iter(), key=lambda ex: ex.message) ## Fetch a list with all validation errors. # doctest: +NORMALIZE_WHITESPACE [<ValidationError: "'string' is not of type 'number'">, <ValidationError: "1 is not of type 'string'">, <ValidationError: 'Gave-up building model after step 1.prevalidate (out of 4).'>]
>>> mdl = mm.model >>> mdl is None ## No model constructed, failed before merging. True
And lets try to build with valid submodels but invalid merged-one:
>>> mm = MyModel() >>> mm.add_submodel({'a': 'a str'}) >>> mm.add_submodel({'c': 1})
>>> sorted(mm.build_iter(), key=lambda ex: ex.message) # doctest: +NORMALIZE_WHITESPACE [<ValidationError: "'b' is a required property">, <ValidationError: 'Gave-up building model after step 3.validate (out of 4).'>]
-
__init__(curate_funcs=())[source]¶ Parameters: curate_funcs (sequence) – See _curate_funcs.
-
__metaclass__¶ alias of
abc.ABCMeta
-
_curate()[source]¶ Step-4: Invokes any curate-functions found in
_curate_funcs.
-
_get_json_schema(is_prevalidation)[source]¶ Returns: a json schema, more loose when prevalidationfor each caseReturn type: dictionary
-
_select_context(path, branch)[source]¶ Finds which context to use while visiting model-nodes, by enforcing the precedance-rules described in the Customizations.
Parameters: Returns: the selected
ModelOperations
-
add_submodel(model, path_ops=None)[source]¶ Pushes on top a submodel, along with its context-map.
Parameters: - model – the model-tree (sequence, mapping, pandas-types)
- path_ops (dict) – A map of
json_paths–>ModelOperationsinstances acting on the unified-model. Thepath_opsmay often be empty.
Examples
To change the default DataFrame –> dictionary convertor for a submodel, use the following:
>>> mdl = {'foo': 'bar'} >>> submdl = ModelOperations(mdl, conv={(pd.DataFrame, dict): lambda df: df.to_dict('record')})
-
build()[source]¶ Attempts to build the model by exhausting
build_iter(), or raises its 1st error.Use this method when you do not want to waste time getting the full list of errors.
-
build_iter()[source]¶ Iteratively build model, yielding any problems as
ValidationErrorinstances.For debugging, the unified model at
modelmy contain intermediate results at any time, even if construction has failed. Check the_erroredflag if neccessary.
-
mod_global_operations(operations=None, **cntxt_kwargs)[source]¶ Since it is the fall-back operation for conversions and IO operation, it must exist and have all its props well-defined for the class to work correctly.
Parameters: - operations (ModelOperations) – Replaces values of the installed context with non-empty values from this one.
- cntxt_kwargs – Replaces the keyworded-values on the existing
operations. SeeModelOperationsfor supported keywords.
-
unified_contexts¶ A map of
json_paths–>ModelOperationsinstances acting on the unified-model.
-
pandalone.pandata.PandelVisitor(schema, resolver=None, format_checker=None, auto_default: Optional[bool] = True, auto_default_nulls: Optional[bool] = False, auto_remove_nulls: Optional[bool] = False)[source]¶ A customized jsonschema-validator suporting instance-trees with pandas and numpy objects, natively.
Parameters: - auto_default –
When the tri-state bool
autoDefaultin schema or this param are enabled, it applies any schema’sdefaultvalue if a property is missing and schema’stypedoes not supportnulls.- Independent of
auto_default_nulls(you may enable both). - See meth:
_rule_auto_defaults_properties.
- Independent of
- auto_default_nulls –
When the tri-state bool
autoDefaultNullin schema or this param are it applies any schema’sdefaultvalue if the property isnulland schema’stypedoes not supportnulls.- Independent of
auto_default(you may enable both). - Take precedence over
auto_remove_nulls. - See meth:
_rule_auto_defaults_properties.
- Independent of
- auto_remove_nulls –
When the tri-state bool
autoRemoveNullin schema or this param are it removes anullproperty value if the schema’stypedoes not acceptnulls.- See meth:
_rule_auto_defaults_properties.
Attention
If this is enabled, any
requiredproperties rule must FOLLOW thepropertiesrule. - See meth:
Any pandas or numpy instance (for example
obj) is treated like that:Python Type JSON Equivalence pandas.DataFrameas
objectjson-type, with: keys:obj.columns(MUST be strings) values:obj[col].valuesNOTE: len(df) on rows(!), not columns.
pandas.Series- as
objectjson-type, with: keys:obj.index(MUST be strings) values:obj.values - as
arrayjson-type
np.ndarrayas arrayjson-type IF ndim == 1cabc.Sequenceas arrayIF not string (like lists, tuples)Note that the value of each dataFrame column is a :
ndarrayinstances.The simplest validations of an object or a pandas-instance is like this:
>>> import pandas as pd
>>> schema = { ... 'type': 'object', ... } >>> pv = PandelVisitor(schema)
>>> pv.validate({'foo': 'bar'}) >>> pv.validate(pd.Series({'foo': 1})) >>> pv.validate([1,2]) ## A sequence is invalid here. Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: [1, 2] is not of type 'object' <BLANKLINE> Failed validating 'type' in schema: {'type': 'object'} <BLANKLINE> On instance: [1, 2]
Or demanding specific properties with
requiredand noadditionalProperties:>>> schema = { ... 'type': 'object', ... 'properties': { ... 'foo': {} ... }, ... 'required': ['foo'], ... 'additionalProperties': False, ... } >>> pv = PandelVisitor(schema)
>>> pv.validate(pd.Series({'foo': 1})) >>> pv.validate(pd.Series({'foo': 1, 'bar': 2})) ## Additional 'bar' is present! Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: Additional properties are not allowed ('bar' was unexpected) <BLANKLINE> Failed validating 'additionalProperties' in schema: {'additionalProperties': False, 'properties': {'foo': {}}, 'required': ['foo'], 'type': 'object'} <BLANKLINE> On instance: foo 1 bar 2 dtype: int64
>>> pv.validate(pd.Series({})) ## Required 'foo' missing! Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: 'foo' is a required property <BLANKLINE> Failed validating 'required' in schema: {'additionalProperties': False, 'properties': {'foo': {}}, 'required': ['foo'], 'type': 'object'} <BLANKLINE> On instance: Series([], dtype: float64)
- auto_default –
-
pandalone.pandata._U¶ alias of
pandalone.pandata.United
-
pandalone.pandata._find_additional_properties(instance, schema)[source]¶ Return the set of additional properties for the given
instance.Weeds out properties that should have been validated by
propertiesand / orpatternProperties.Assumes
instanceis dict-like already.
-
pandalone.pandata._rule_auto_defaults_properties(validator, properties, instance, schema, original_props_rule, auto_default, auto_default_nulls, auto_remove_nulls)[source]¶ Adapted from: https://python-jsonschema.readthedocs.io/en/stable/faq/#frequently-asked-questions
-
pandalone.pandata._units_cleaner_regex= re.compile('^[<[]|[\\]>]$')¶ An item-descriptor with units, i.e. used as a table-column header.
-
pandalone.pandata.first_defined(*var, default=None)[source]¶ Return the 1st non-none
var, ordefault.
-
pandalone.pandata.iter_jsonpointer_parts(jsonpath)[source]¶ Generates the
jsonpathparts according to jsonpointer spec.Parameters: jsonpath (str) – a jsonpath to resolve within document Returns: The parts of the path as generator), without converting any step to int, and None if None. Author: Julian Berman, ankostis Examples:
>>> list(iter_jsonpointer_parts('/a/b')) ['a', 'b'] >>> list(iter_jsonpointer_parts('/a//b')) ['a', '', 'b'] >>> list(iter_jsonpointer_parts('/')) [''] >>> list(iter_jsonpointer_parts('')) []
But paths are strings begining (NOT_MPL: but not ending) with slash(‘/’):
>>> list(iter_jsonpointer_parts(None)) Traceback (most recent call last): AttributeError: 'NoneType' object has no attribute 'split' >>> list(iter_jsonpointer_parts('a')) Traceback (most recent call last): jsonschema.exceptions.RefResolutionError: Jsonpointer-path(a) must start with '/'! #>>> list(iter_jsonpointer_parts('/a/')) #Traceback (most recent call last): #jsonschema.exceptions.RefResolutionError: Jsonpointer-path(a) must NOT ends with '/'!
-
pandalone.pandata.iter_jsonpointer_parts_relaxed(jsonpointer)[source]¶ Like
iter_jsonpointer_parts()but accepting also non-absolute paths.The 1st step of absolute-paths is always ‘’.
Examples:
>>> list(iter_jsonpointer_parts_relaxed('a')) ['a'] >>> list(iter_jsonpointer_parts_relaxed('a/')) ['a', ''] >>> list(iter_jsonpointer_parts_relaxed('a/b')) ['a', 'b'] >>> list(iter_jsonpointer_parts_relaxed('/a')) ['', 'a'] >>> list(iter_jsonpointer_parts_relaxed('/a/')) ['', 'a', ''] >>> list(iter_jsonpointer_parts_relaxed('/')) ['', ''] >>> list(iter_jsonpointer_parts_relaxed('')) ['']
-
pandalone.pandata.parse_value_with_units(arg)[source]¶ Parses name-units pairs (i.e. used as a table-column header).
Returns: a United(name, units) named-tuple, or Noneif bad syntax; note thatname=''butunits=Nonewhen missing.Examples:
>>> parse_value_with_units('value [units]') United(name='value', units='units') >>> parse_value_with_units('foo bar <bar/krow>') United(name='foo bar', units='bar/krow') >>> parse_value_with_units('no units') United(name='no units', units=None) >>> parse_value_with_units('') United(name='', units=None)
But notice:
>>> assert parse_value_with_units('ok but [bad units') is None >>> parse_value_with_units('<only units>') United(name='', units='only units') >>> parse_value_with_units(None) Traceback (most recent call last): TypeError: expected string or ...
-
pandalone.pandata.resolve_jsonpointer(doc, jsonpointer, default=<object object>)[source]¶ Resolve a
jsonpointerwithin the referenceddoc.Parameters: - doc – the referrant document
- path (str) – a jsonpointer to resolve within document
- default – A value to return if path does not resolve.
Returns: the resolved doc-item or raises
RefResolutionErrorRaises: RefResolutionError (if cannot resolve path and no
default)Examples:
>>> dt = { ... 'pi':3.14, ... 'foo':'bar', ... 'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')), ... 'sub': { ... 'sr': pd.Series({'abc':'def'}), ... } ... } >>> resolve_jsonpointer(dt, '/pi', default=_scream) 3.14
>>> resolve_jsonpointer(dt, '/pi/BAD') Traceback (most recent call last): jsonschema.exceptions.RefResolutionError: Unresolvable JSON pointer('/pi/BAD')@(BAD)
>>> resolve_jsonpointer(dt, '/pi/BAD', 'Hi!') 'Hi!'
Author: Julian Berman, ankostis
-
pandalone.pandata.resolve_path(doc, path, default=<object object>, root=None)[source]¶ Like
resolve_jsonpointer()also for relative-paths & attribute-branches.Parameters: - doc – the referrant document
- path (str) – An abdolute or relative path to resolve within document.
- default – A value to return if path does not resolve.
- root – Document for absolute paths, assumed
docif missing.
Returns: the resolved doc-item or raises
RefResolutionErrorRaises: RefResolutionError (if cannot resolve path and no
default)Examples:
>>> dt = { ... 'pi':3.14, ... 'foo':'bar', ... 'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')), ... 'sub': { ... 'sr': pd.Series({'abc':'def'}), ... } ... } >>> resolve_path(dt, '/pi', default=_scream) 3.14
>>> resolve_path(dt, 'df/V') 0 1.0 1 1.0 2 1.0 Name: V, dtype: float64
>>> resolve_path(dt, '/pi/BAD', 'Hi!') 'Hi!'
Author: Julian Berman, ankostis
-
pandalone.pandata.rule_enum(validator, enums, instance, schema)[source]¶ Overridden to evade pandas-equals after Julian/jsonschema#575 fixed bool != 0,1 (v3.0.2).
-
pandalone.pandata.set_jsonpointer(doc, jsonpointer, value, object_factory=<class 'dict'>)[source]¶ Resolve a
jsonpointerwithin the referenceddoc.Parameters: - doc – the referrant document
- jsonpointer (str) – a jsonpointer to the node to modify
Raises: RefResolutionError (if jsonpointer empty, missing, invalid-contet)