The Database Operator Class
- class dwopt.db._Db(eng)
Generic database operator class. There are 3 main usages:
Run sql statment.
Run DDL/DML via the convenience methods.
Create query object, which allows running summary query.
This base class should not be instantiated directly by user , it’s child classes relevant to varies databases should be instantiated and used instead. Child classes:
dwopt.db.Pg: Postgre database operator class.
dwopt.db.Lt: Sqlite database operator class.
dwopt.db.Oc: Oracle database operator class.
The operator objects:
dwopt.pg: postgre.
dwopt.lt: sqlite.
dwopt.oc: Oracle.
The operator constructors:
dwopt.Pg(eng): Postgre.
dwopt.Lt(eng): Sqlite.
dwopt.Oc(eng): Oracle.
- Parameters
eng (sqlalchemy engine) – Database connection engine to be used.
- eng
Database connection engine that is used.
- Type
sqlalchemy engine
- add_pkey(tbl_nme, pkey)
Make and run an add primary key statement. Example sql code:
alter table tbl_nme add primary key (col1, col2, ...)
- Parameters
tbl_nme (str) – Name of the database table to operate on.
tbl_nme – columns names in form
col1, col2, ...
.
Examples
Make example table on sqlite, then add primary key constraint.
>>> from dw import lt >>> >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'text'}) >>> lt.add_pkey('col1,col2')
- create(tbl_nme, dtypes=None, **kwargs)
Make and run a create table statment. Example sql code:
create table tbl_nme( ,col1 dtype1 ,col2 dtype2 ... )
- Parameters
tbl_nme (str) – Name of the table to create.
dtypes (dict, optional) – Dictionary of column names to data types mappings.
**kwargs – Convenient way to add mappings. Keyword to argument mappings will be added to the dtypes dictionary. The keyword cannot be one of the positional parameter names.
Notes
Datatypes
Datatypes varies across databses, common example below:
Type
Sqlite
Postgre
Oracle
integer
integer
bigint
number
float
real
float8
float
string
text
varchar(20)
varchar2(20)
datetime
text
timestamp
timestamp
date
text
date
date
Details:
Other statements
The dtypes mappings also allow other sql statements which are part of a create statement to be added. For example a primary key constraint.
Details:
https://www.postgresql.org/docs/current/sql-createtable.html
https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/CREATE-TABLE.html
Examples
Make example table on sqlite.
>>> from dw import lt ... >>> lt.drop('test') >>> lt.create('test' ... ,{ ... 'id':'integer' ... ,'score':'real' ... ,'amt':'integer' ... ,'cat':'text' ... ,'time':'text' ... ,'constraint df_pk': ... 'primary key (id)' ... })
- drop(tbl_nme)
Make and run a drop table statement. Does not throw error if table not exist. Example sql code:
drop tbl_nme
Oracle sql code:
drop tbl_nme purge
- Parameters
tbl_nme (str) – Name of the database table to drop.
- Returns
tbl_nme not exist
if table not exist ,tbl_nme dropped
otherwise.- Return type
str
Notes
Error catching
Does not catch specifically the error related to table not exist in this version.
Examples
Make example table on sqlite, then drop it.
>>> from dw import lt >>> >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'text'}) >>> lt.drop('test')
- list_cons()
List all constraints.
- Return type
pandas.DataFrame
Notes
Postgre sql used:
select * from information_schema.constraint_table_usage
Details:
https://www.postgresql.org/docs/current/infoschema-constraint-table-usage.html
- list_tables(owner)
List all tables on database or specified schema.
- Parameters
owner (str) – Only applicable for oracle. Name of the schema.
- Return type
pandas.DataFrame
Notes
Postgre sql used:
select table_catalog,table_schema,table_name ,is_insertable_into,commit_action from information_schema.tables where table_schema not in ('information_schema','pg_catalog')
Sqlite sql used:
select * from sqlite_schema where type ='table' and name NOT LIKE 'sqlite_%'
Oracle sql used:
select/*+PARALLEL (4)*/ owner,table_name ,max(column_name),min(column_name) from all_tab_columns where owner = ':owner' group by owner,table_name
Details:
https://www.postgresql.org/docs/current/infoschema-tables.html https://www.sqlite.org/schematab.html https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/ALL_TAB_COLUMNS.html
- qry(*args, **kwargs)
Make query object. Different database operator object method returns different query object.
- Parameters
*args – Positional arguments of the query object.
**kwargs – keyword arguments of the query object.
- Return type
Examples
Make query object from sqlite database operator object.
>>> from dwopt import lt >>> lt.qry("test").where("x>5").print() select * from test where x>5
- run(sql=None, args=None, pth=None, mods=None, **kwargs)
Run sql statement.
Support argument passing, text replacement and reading statements from sql script.
- Parameters
sql (str, optional) – The sql statement to run. Only 1 statement is allowed.
args (dict, optional) – Dictionary of argument name str to argument str mappings. These arguments are passed to the database, to function as data for the argument names. See the notes and the examples section for details.
pth (str, optional) – Path to sql script, ignored if the sql parameter is not None. The script can hold one and only one sql statement, typically a significant piece of table creation statement.
mods (dict, optional) – Dictionary of modification name str to modification str mappings. Replaces modification name in the sql by the respective modification str. See the notes and the examples section for details.
**kwargs – Convenient way to add modification mappings. Keyword to argument mappings will be added to the mods dictionary. The keyword cannot be one of the positional parameter names.
- Returns
Returns dataframe if the database returns any result. Returns dataframe with column names and zero rows if running query that returns zero rows. Returns None otherwise, typically when running DDL/DML statement.
- Return type
pandas.DataFrame or None
Notes
The args and the mods parameter
An argument name is denoted in the sql by prepending a colon symbol
:
before a str.Similiarly, a modification name is denoted by prepending a colon symbol
:
before a str in the sql. The end of str is to be followed by a symbol other than a lower or upper case letter, or a number. It is also ended before a line break.The args parameter method of passing arguments is less prone to unintended sql injection, while the mods paramter method of text replacement gives much more flexibility when it comes to programatically generate sql statment. For example when database does not support argument passing on DDL/DML statements.
Examples
Make example table on sqlite.
>>> import pandas as pd >>> from dw import lt >>> >>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'int'}) >>> lt.write(tbl,'test')
Run sql.
>>> lt.run("select * from test") col1 col2 0 1 3 1 2 4
Run sql with argument passing.
>>> lt.run("select * from test where col1 = :cond",args = {'cond':2}) col1 col2 0 2 4
Run sql with text modification.
>>> tbl_nme = 'test2' >>> lt.run("drop table :tbl",mods = {'tbl':tbl_nme}) >>> lt.run("create table :tbl as select * from test where :col = 1" >>> , mods = {'tbl':tbl_nme}, col = 'col1') >>> lt.run("select *,col2 + 1 as :col from :tbl_nme" >>> , tbl_nme = tbl_nme, col = 'col3') col1 col2 col3 0 1 3 4
- table_cols(sch_tbl_nme)
Show information of specified table’s columns.
Notes
Postgre sql used:
select column_name, data_type from information_schema.columns where table_schema = ':schema_nme' and table_name = ':tbl_nme'
Oracle sql used:
select/*+PARALLEL (4)*/ * from all_tab_columns where owner = ':schema_nme' and table_name = ':tbl_nme'
Details:
https://www.postgresql.org/docs/current/infoschema-columns.html https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/ALL_TAB_COLUMNS.html
- Parameters
sch_tbl_nme (str) – Table name in format: schema.table.
- Return type
pandas.DataFrame
- table_sizes()
List sizes of all tables in current schema.
- Return type
pandas.DataFrame
Notes
Oracle sql used:
select/*+PARALLEL (4)*/ tablespace_name,segment_type,segment_name ,sum(bytes)/1024/1024 table_size_mb from user_extents group by tablespace_name,segment_type,segment_name
Details:
https://docs.oracle.com/en/database/oracle/oracle-database/21/refrn/USER_EXTENTS.html
- write(tbl, tbl_nme)
Make and run a insert statement. Example sql code:
insert into tbl_nme (col1, col2, ...) values (:col1, :col2, ...)
With arguments to sql being:
{ ['col1' : data1, 'col2' : data2, ...] ,['col1' : data3, 'col2' : data4, ...] ... }
- Parameters
tbl (pandas.DataFrame) – Payload Dataframe with data to insert.
tbl_nme (str) – Name of the database table to insert into.
Notes
Datetime
Datetime objects are converted into str before inserting.
Null values
Behaviour with null values are not tested for current version.
Examples
Make example table on sqlite, then insert rows into it.
>>> import pandas as pd >>> from dw import lt >>> >>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'text'}) >>> lt.write(tbl,'test')
- write_nodup(tbl, tbl_nme, pkey, where=None)
Make and run a insert statement without creating duplicates on the database table. Implemented as below process:
Make and run a select statement with optionally provided where clause.
If step 1 returns any results and the payload table in non-empty , remove duplicates on the payload table, using the provided primary key columns as judge of duplication.
Make insert statement on the non-duplicating payload data.
Example sql code:
select * from tbl_nme where where_clause; insert into tbl_nme (col1, col2, ...) values (:col1, :col2, ...)
With arguments to sql being:
{ ['col1' : data1, 'col2' : data2, ...] ,['col1' : data3, 'col2' : data4, ...] ... }
- Parameters
tbl (pandas.DataFrame) – Payload Dataframe with data to insert.
pkey ([str]) – Iterable of column name str.
tbl_nme (str) – Name of the database table to insert into.
where (str) – where clause in str form. The
where
keyword is not needed.
Examples
Make example table on sqlite, then insert duplicated rows into it.
>>> import pandas as pd >>> from dw import lt >>> >>> tbl = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']}) >>> tbl2 = pd.DataFrame({'col1': [1, 3], 'col2': ['a', 'c']}) >>> lt.drop('test') >>> lt.create('test',{'col1':'int','col2':'text'}) >>> lt.write(tbl,'test') >>> lt.write_nodup(tbl2,'test',['col1'],"col1 < 4") >>> lt.run("select * from test") col1 col2 0 1 a 1 2 b 2 3 c