Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix default_paramstyle #27

Merged
merged 5 commits into from
Oct 16, 2019
Merged

fix default_paramstyle #27

merged 5 commits into from
Oct 16, 2019

Conversation

koxudaxi
Copy link
Owner

The PR fixes default_paramstyle for dialect.

Related Issue

#19

@codecov
Copy link

codecov bot commented Oct 15, 2019

Codecov Report

Merging #27 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #27   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           4      4           
  Lines         393    391    -2     
  Branches       48     48           
=====================================
- Hits          393    391    -2
Impacted Files Coverage Δ
pydataapi/dbapi.py 100% <100%> (ø) ⬆️
pydataapi/pydataapi.py 100% <100%> (ø) ⬆️
pydataapi/dialect.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2ea173...f68ed18. Read the comment docs.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

with 2267650 I still receive the pydantic error

@koxudaxi
Copy link
Owner Author

koxudaxi commented Oct 15, 2019

I just fixed it. it work

    from sqlalchemy.orm import sessionmaker
    Session = sessionmaker()
    Session.configure(bind=engine)
    s = Session()
    s.bulk_save_objects([Pets(name='pet_name1'), Pets(name='pet_name2')])
    s.commit()

output

2019-10-16 04:20:19,575 INFO sqlalchemy.engine.base.Engine select version()
2019-10-16 04:20:19,575 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 04:20:19,588 INFO sqlalchemy.engine.base.Engine select current_schema()
2019-10-16 04:20:19,588 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 04:20:19,603 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-10-16 04:20:19,603 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 04:20:19,611 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-10-16 04:20:19,611 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 04:20:19,618 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
2019-10-16 04:20:19,618 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 04:20:19,642 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-10-16 04:20:19,654 INFO sqlalchemy.engine.base.Engine INSERT INTO pets (name) VALUES (:name)
2019-10-16 04:20:19,654 INFO sqlalchemy.engine.base.Engine ({'name': 'pet_name1'}, {'name': 'pet_name2'})
2019-10-16 04:20:19,665 INFO sqlalchemy.engine.base.Engine COMMIT

Process finished with exit code 0

@koxudaxi
Copy link
Owner Author

@Rubyj
Would you please test it?

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

@koxudaxi I still receive an error but it is unrelated:

Traceback (most recent call last):
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/chalice/app.py", line 1082, in _get_view_function_response
    response = view_function(**function_args)
  File "/home/rjacobs/git/vcfparserlambda/src/parser/app.py", line 26, in index
    parse_vcf(n)
  File "/home/rjacobs/git/vcfparserlambda/src/parser/app.py", line 39, in parse_vcf
    BulkInsert().go(session, variants)
  File "/home/rjacobs/git/vcfparserlambda/src/parser/chalicelib/queries.py", line 3, in go
    session.bulk_save_objects(inserts)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2700, in bulk_save_objects
    False,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2888, in _bulk_save_mappings
    transaction.rollback(_capture_exception=True)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2882, in _bulk_save_mappings
    render_nulls,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 102, in _bulk_insert
    bookkeeping=return_defaults,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1084, in _emit_insert_statements
    c = cached_connections[connection].execute(statement, multiparams)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1468, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1224, in _execute_context
    cursor, statement, parameters, context
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 549, in do_executemany
    cursor.executemany(statement, parameters)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/pydataapi/dbapi.py", line 140, in executemany
    results = self._data_api.batch_execute(operation, seq_of_parameters)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/pydataapi/pydataapi.py", line 399, in batch_execute
    **options.build()
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.BadRequestException: An error occurred (BadRequestException) when calling the BatchExecuteStatement operation: Number of SQL parameters specified is more than 1000

Is there a way for me to increase this limit? Or will this bulk insert take too long.

@koxudaxi
Copy link
Owner Author

koxudaxi commented Oct 15, 2019

I found it in https://docs.aws.amazon.com/rdsdataservice/latest/APIReference/API_BatchExecuteStatement.html

parameterSets
The parameter set for the batch operation.

The maximum number of parameters in a parameter set is 1,000.

I think the limitation can not be changed. We can ask about it to AWS supports...

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

I found it in https://docs.aws.amazon.com/rdsdataservice/latest/APIReference/API_BatchExecuteStatement.html

parameterSets
The parameter set for the batch operation.

The maximum number of parameters in a parameter set is 1,000.

Yeah, I just saw that as well. Do you have any ideas for workaround? Build multiple lists each being 1000 long maybe?

@koxudaxi
Copy link
Owner Author

Build multiple lists each being 1000 long maybe?

@Rubyj
If you can limit the count of bulk insert objects then, I think it is the best way. because DataAPI has a few limitations on operations.
Could you save the count?

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

Build multiple lists each being 1000 long maybe?

@Rubyj
If you can limit the count of bulk insert objects then, I think it is the best way. because DataAPI has a few limitations on operations.
Could you save the count?

I have about 60k objects to bulk insert. In theory, I can make 60 lists each with 1000 objects and bulk insert them all.

@koxudaxi
Copy link
Owner Author

We have another way that the library divide objects each 1000. And the library call operations.
I'm thinking whether or not do it because I think It is a tricky method.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

We have another way that the library divide objects each 1000. And the library call operations.
I'm thinking whether or not do it because I think It is a tricky method.

I will try to divide my data by 1000. It should not be too hard.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

@koxudaxi I was able to batch my data by 1000. However, now I run into the following error.

Traceback (most recent call last):
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/chalice/app.py", line 1082, in _get_view_function_response
    response = view_function(**function_args)
  File "/home/rjacobs/git/vcfparserlambda/src/parser/app.py", line 26, in index
    parse_vcf(n)
  File "/home/rjacobs/git/vcfparserlambda/src/parser/app.py", line 41, in parse_vcf
    BulkInsert().go(session, annotation_list)
  File "/home/rjacobs/git/vcfparserlambda/src/parser/chalicelib/queries.py", line 3, in go
    session.bulk_save_objects(inserts)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2700, in bulk_save_objects
    False,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2888, in _bulk_save_mappings
    transaction.rollback(_capture_exception=True)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 2882, in _bulk_save_mappings
    render_nulls,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 102, in _bulk_insert
    bookkeeping=return_defaults,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/orm/persistence.py", line 1084, in _emit_insert_statements
    c = cached_connections[connection].execute(statement, multiparams)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
    return meth(self, multiparams, params)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
    distilled_params,
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
    e, statement, parameters, cursor, context
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1468, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1224, in _execute_context
    cursor, statement, parameters, context
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 549, in do_executemany
    cursor.executemany(statement, parameters)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/pydataapi/dbapi.py", line 140, in executemany
    results = self._data_api.batch_execute(operation, seq_of_parameters)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/pydataapi/pydataapi.py", line 399, in batch_execute
    **options.build()
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/rjacobs/git/vcfparserlambda/venv/lib/python3.7/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (SerializationException) when calling the BatchExecuteStatement operation: 

This happens on the model with a relationship to the other model

class VariantAnnotation(Base):
    variant = relationship('Variant', backref='annotations')
variant_object = Variant()
variant_annotation = VariantAnnotation(variant=variant_object)

@koxudaxi
Copy link
Owner Author

@Rubyj
Would you please set echo=True?
I'm interested in SerializationException

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

@Rubyj
Would you please set echo=True?
I'm interested in SerializationException

I set echo=True for SQLAlcemy create_engine but because I am using chalice I do not see any logging. Hmm. How can I make this work with chalice?

@koxudaxi
Copy link
Owner Author

@Rubyj
Sorry, I must go to bed. I will continue it tomorrow.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

@Rubyj
Sorry, I must go to bed. I will continue it tomorrow.

Ok! I will continue to debug today and post what I find for you to see tomorrow. I can run outside of chalice using regular python. Thank you!!

Even outside of chalice with echo=True in create_engine I see no extra logging. 😞

@koxudaxi
Copy link
Owner Author

It works to run pydataapi(mysql) with sam
query is ...

    result: ResultProxy = engine.execute("select * from pets")

output is
スクリーンショット 2019-10-16 5 31 10

スクリーンショット 2019-10-16 5 31 23

@Rubyj
Copy link
Contributor

Rubyj commented Oct 15, 2019

It works to run pydataapi(mysql) with sam
query is ...

    result: ResultProxy = engine.execute("select * from pets")

output is
スクリーンショット 2019-10-16 5 31 10

スクリーンショット 2019-10-16 5 31 23

Weird. I wonder why I can not get it to work in my sam project with postgres. I can try again. Did you have to do anything special?

@koxudaxi I took a look at the data that is being sent to the data api from python. It looks like the foreign key which i showed above is not included in the the data that is being sent to _make_api_call().

Here is an example:

[
  {
    "name": "allele",
    "value": {
      "stringValue": "T"
    }
  },
  {
    "name": "annotation",
    "value": {
      "stringValue": "test"
    }
  },
  {
    "name": "annotation_impact",
    "value": {
      "stringValue": "test"
    }
  },
  {
    "name": "gene_name",
    "value": {
      "stringValue": "test"
    }
  },
  {
    "name": "gene_id",
    "value": {
      "stringValue": "test"
    }
  },
  {
    "name": "hgvs_c",
    "value": {
      "stringValue": "*test\u003eT"
    }
  },
  {
    "name": "hgvs_p",
    "value": {
      "doubleValue": "nan"
    }
  },
  {
    "name": "reference",
    "value": {
      "stringValue": "test"
    }
  }
]

I do not see the reference back to the Variant object included in this data. I make that reference as described here: https://stackoverflow.com/a/17330019/3439441

class Child(Base):
   parent_id = Column(Integer, ForeignKey('parent.id'))
   parent = relationship('Parent', backref='childs')

parent_object = Parent(name="test")
child_object = Child(parent=parent_object)
session.bulk_save_objects([parent_object])
session.bulk_save_objects([child_object]) ------ this is where error occurs

Assigning the parent object like I did here in my code does not seem to make it to the make_api_call

We could start with something like this:

u = User(user_name=u'dusual')
# no need to flush, no need to add `u` to the session because sqlalchemy becomes aware of the object once we assign it to c.user
c = Client(user=u, orgname="dummy_org")
session.add(c)

@koxudaxi
Copy link
Owner Author

koxudaxi commented Oct 16, 2019

I have test simple relation tables. It works fine. However, I have run with Aurora Serverless MySQL.
I will create a PostgresSQL Cluster and test again tonight.

class Parent(base):
    __tablename__ = 'parent'
    id = Column(Integer, primary_key=True)
    name = Column(String(255, collation='utf8_unicode_ci'), default=None)


class Child(base):
    __tablename__ = 'child'
    id = Column(Integer, primary_key=True)
    name = Column(String(255, collation='utf8_unicode_ci'), default=None)
    parent_id = Column(Integer, ForeignKey('parent.id'))
    parent = relationship('Parent', backref='childs')
    s = Session()
    parent_object = Parent(id=1, name="test")
    child_object = Child(id=1, parent=parent_object)
    s.bulk_save_objects([parent_object])
    s.bulk_save_objects([child_object])
...

@koxudaxi I took a look at the data that is being sent to the data api from python. It looks like the foreign key which i showed above is not included in the the data that is being sent to

I think SQLAlchemy split SQL to parent and child
I put raw operations.

2019-10-16 14:01:58,414 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2019-10-16 14:01:58,414 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 14:01:58,536 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'lower_case_table_names'
2019-10-16 14:01:58,536 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 14:01:58,665 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2019-10-16 14:01:58,666 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 14:01:59,105 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2019-10-16 14:01:59,105 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 14:01:59,165 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2019-10-16 14:01:59,165 INFO sqlalchemy.engine.base.Engine {}
2019-10-16 14:01:59,294 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-10-16 14:01:59,354 INFO sqlalchemy.engine.base.Engine INSERT INTO parent (id, name) VALUES (:id, :name)
2019-10-16 14:01:59,354 INFO sqlalchemy.engine.base.Engine {'id': 1, 'name': 'test'}
2019-10-16 14:01:59,457 INFO sqlalchemy.engine.base.Engine INSERT INTO child (id) VALUES (:id)
2019-10-16 14:01:59,457 INFO sqlalchemy.engine.base.Engine {'id': 1}

Also, I'm interested in the value.

  {
    "name": "hgvs_p",
    "value": {
      "doubleValue": "nan"
    }
  },

nan may be invalid value.

@koxudaxi
Copy link
Owner Author

@Rubyj
I create a Postgres Aurora Cluster And deploy an app with sam. It works fine.
Also, I test bulk-insert on simple relation table. It's OK.

If you can show me DDL for creating table and SQLAlchemy's Table classes then, I can try it.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

@koxudaxi

Also, I'm interested in the value.

  {
    "name": "hgvs_p",
    "value": {
      "doubleValue": "nan"
    }
  },

nan may be invalid value.

This is strange. I am not sure why it is described as doubleValue. That columns is defined as:

hgvs_p = Column(String(100))

@Rubyj
I create a Postgres Aurora Cluster And deploy an app with sam. It works fine.
Also, I test bulk-insert on simple relation table. It's OK.

If you can show me DDL for creating table and SQLAlchemy's Table classes then, I can try it.

Yes. Here they are:

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy import Column, Integer, String, Float, ForeignKey

Base = declarative_base()


class Variant(Base):
    __tablename__ = 'variant'

    id = Column(Integer, primary_key=True)
    chrom = Column(String(5))
    pos = Column(String(20))
    rs_id = Column(String(50))
    ref = Column(String(50))
    alt = Column(String(50))
    qual = Column(Float)


class VariantAnnotation(Base):
    __tablename__ = 'variant_annotation'

    id = Column(Integer, primary_key=True)
    variant_id = Column(Integer, ForeignKey('variant.id'))
    variant = relationship(Variant, backref='annotations')
    allele = Column(String(50))
    annotation = Column(String(50))
    annotation_impact = Column(String(50))
    gene_name = Column(String(50))
    gene_id = Column(String(50))
    hgvs_c = Column(String(100))
    hgvs_p = Column(String(100))
    reference = Column(String(500))

What is the best way to get DDL from serverless? I took a look at the table information and it matched these models:

table_catalog table_schema table_name column_name ordinal_position column_default is_nullable data_type character_maximum_length
postgres public variant_annotation id 1 nextval('variant_annotation_id_seq'::regclass) NO integer NULL
postgres public variant_annotation variant_id 2 NULL YES integer NULL
postgres public variant_annotation allele 3 NULL YES character varying 50
postgres public variant_annotation annotation 4 NULL YES character varying 50
postgres public variant_annotation annotation_impact 5 NULL YES character varying 50
postgres public variant_annotation gene_name 6 NULL YES character varying 50
postgres public variant_annotation gene_id 7 NULL YES character varying 50
postgres public variant_annotation hgvs_c 8 NULL YES character varying 100
postgres public variant_annotation hgvs_p 9 NULL YES character varying 100
postgres public variant_annotation reference 10 NULL YES character varying 500

@koxudaxi
Copy link
Owner Author

koxudaxi commented Oct 16, 2019

@Rubyj
Thank you for showing me, classes.

What is the best way to get DDL from serverless?

Sorry, I don't know.
Did you create a table? from the SQLSlchemy classes? or DDL?
I want to create tables in the same way.

This is strange. I am not sure why it is described as doubleValue. That columns is defined as:
hgvs_p = Column(String(100))

It may be bugs 😖

PS. I just created tables from SQLAlchemy classes.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

Did you create a table? from the SQLSlchemy classes? or DDL?
I want to create tables in the same way.

Yes, I created the tables using alembic.

After creating models.

pip install alembic
alembic init alembic --- in project directory

alembic.ini

sqlalchemy.url = postgresql+pydataapi://

alembic/env.py - edit run_migrations_online() to use create_engine

def run_migrations_online():
    """Run migrations in 'online' mode.

    In this scenario we need to create an Engine
    and associate a connection with the context.

    """
    aws_args = {'resource_arn': 'arn...',
                'secret_arn': 'arn...',
                'database': 'postgres'}
    url = config.get_main_option("sqlalchemy.url")
    connectable = create_engine(
        url,
        connect_args=aws_args,
        poolclass=pool.NullPool
    )
...

alembic revision --autogenerate -m "Added tables"
alembic upgrade head

@koxudaxi
Copy link
Owner Author

koxudaxi commented Oct 16, 2019

alembic dump an error.

  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

I create all the tables in a simple way.

Base.metadata.create_all(bind=engine)

It works.

I try to build-insert

I run the lines

variant_object = Variant(alt="test")
variant_annotation = VariantAnnotation(variant=variant_object)
s.bulk_save_objects([variant_object])
s.bulk_save_objects([variant_annotation])
s.commit()
result: ResultProxy = engine.execute(Select([Variant]))
print(result.fetchall())

output

[(2, True, True, True, True, 'test', True)]

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

alembic dump an error.

  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

I create all the tables in a simple way.

Base.metadata.create_all(bind=engine)

It works.

I try to build-insert

Not sure about that error. What does your alembic setup look like? Ok I will also try to look at my models and make length longer. May have error there.

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

output

[(2, True, True, True, True, 'test', True)]

why are there so many boolean values? Hmm maybe this is a problem with my data then. I will need to investigate further why this thinks my string is a double. I just made the inserts work with the text INSERT INTO statements from the code.

@koxudaxi
Copy link
Owner Author

It's a bug!!

Real Values are

<class 'list'>: [{'longValue': 2}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'stringValue': 'test'}, {'isNull': True}]

Its should be None
I'm fixing the part 😉

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

It's a bug!!

Real Values are

<class 'list'>: [{'longValue': 2}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'isNull': True}, {'stringValue': 'test'}, {'isNull': True}]

Its should be None
I'm fixing the part

aha! Although, I think this is a different bug 😅 I wonder if this is related to my nan being a double

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

@koxudaxi I tried increasing the length on my models but I still get the SerializationException. I wonder how I can debug this better or get a better error message.

@koxudaxi
Copy link
Owner Author

@Rubyj
I set a break point to debug with PyCharm on a local machine.
Could you do it?

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

@Rubyj
I set a break point to debug with PyCharm on a local machine.
Could you do it?

Yes, I can. I have been trying to set breakpoint, but I do not know the correct place to do so to get a good error message.

@koxudaxi
Copy link
Owner Author

Here !! for checking request of batch_execute
sql is raw sql.
スクリーンショット 2019-10-17 0 29 44

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

@koxudaxi I got the logging to work!!!

I think nan is indeed the issue. I see this:

'hgvs_p': nan, In the insert statement. It is supposed to be a string. I think I will need to clean my data. For some reason, pandas is treating no value as {float} nan

@koxudaxi
Copy link
Owner Author

Should clean up data by pydataapi?

We should fix the converting method.

def convert_value(value: Any) -> Dict[str, Any]:
    if isinstance(value, bool):
        return {'booleanValue': value}
    elif isinstance(value, str):
        return {'stringValue': value}
    elif isinstance(value, int):
        return {'longValue': value}
    elif isinstance(value, float):
        return {'doubleValue': value}
    elif isinstance(value, bytes):
        return {'blobValue': value}
    elif value is None:
        return {'isNull': True}
    else:
        raise Exception(f'unsupported type {type(value)}: {value} ')

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

Should clean up data by pydataapi?

We should fix the converting method.

def convert_value(value: Any) -> Dict[str, Any]:
    if isinstance(value, bool):
        return {'booleanValue': value}
    elif isinstance(value, str):
        return {'stringValue': value}
    elif isinstance(value, int):
        return {'longValue': value}
    elif isinstance(value, float):
        return {'doubleValue': value}
    elif isinstance(value, bytes):
        return {'blobValue': value}
    elif value is None:
        return {'isNull': True}
    else:
        raise Exception(f'unsupported type {type(value)}: {value} ')

It is working now!!! But It is taking a little while because I have around 100,000 models to insert. The problem was I had numpy.nan in my data. numpy.nan is considered as a float for some reason. I replaced all numpy.nan with None and the Serialization problem went away

Sorry for the trouble!

@koxudaxi
Copy link
Owner Author

@Rubyj
Should we handle numpy.nan? and special cases?
I don't know who should handle the special values.
users? or SQLAlchemy driver?

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

I think it is OK for user to handle this special vale. Otherwise, numpy becomes a dependency of pydataapi.

My inserts have been running for over 5 minutes 🤦‍♂️

@koxudaxi koxudaxi merged commit 210c1e9 into master Oct 16, 2019
@koxudaxi
Copy link
Owner Author

@Rubyj
Thank you for your advice.
OK, The comments are too long. I just merged.

We could fix a lot of bugs in this PR 🎉
I appreciate it very much 😄

@koxudaxi koxudaxi deleted the fix_default_paramstyle branch October 16, 2019 15:58
@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

@koxudaxi Ofc ourse!! Thank YOU!

One last thing. Do you know if bulk_insert_mappings() will work with pydataapi and foreign keys instead of bulk_save_objects()?

@koxudaxi
Copy link
Owner Author

koxudaxi commented Oct 16, 2019

Sorry, I don't know it. However, It works bulk_insert_mappings() for a single table.

2019-10-17 01:03:05,610 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-10-17 01:03:05,634 INFO sqlalchemy.engine.base.Engine INSERT INTO parent (id, name) VALUES (:id, :name)
2019-10-17 01:03:05,634 INFO sqlalchemy.engine.base.Engine {'id': 3, 'name': 'test'}
2019-10-17 01:03:05,667 INFO sqlalchemy.engine.base.Engine COMMIT

@Rubyj
Copy link
Contributor

Rubyj commented Oct 16, 2019

Sorry, I don't know it. However, It work bulk_insert_mappings() for a single table.

2019-10-17 01:03:05,610 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-10-17 01:03:05,634 INFO sqlalchemy.engine.base.Engine INSERT INTO parent (id, name) VALUES (:id, :name)
2019-10-17 01:03:05,634 INFO sqlalchemy.engine.base.Engine {'id': 3, 'name': 'test'}
2019-10-17 01:03:05,667 INFO sqlalchemy.engine.base.Engine COMMIT

Ok I am interested to see if it works for relationship. I can look.

@koxudaxi
Copy link
Owner Author

I have released version 0.4.4 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants