Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Main takeaways:

Get rid of Paginator.

New topics

Alan: get an experience of best practices in terms of ORM performance
Alan: looking to migrate away from Doctrine ORM - more abstracted layer where we hydrate from other stores
Alan: trying Apache Unomi - low priority according to Sikandar
Alan: trying to move away in an iterative approach
Jan: Using repositories as subscriber dependencies, it can slow down the kernel

Jan: Using repositories as subscriber dependencies, it can slow down the kernel

Code Block
languagephp
$doctrine = makeMeADoctrine();

function makeMeADoctrine() {
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService1::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService2::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService3::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService4::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService5::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService6::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService8::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService9::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService10::class)));
    $eventSubscriber->addListener(new MyListener($dic->get(OtherService11::class)));
}

OtherService* must be lazy (https://symfony.com/doc/current/service_container/lazy_services.html)

Jan: listener fetched at runtime

Code Block
languagephp
$lazyEntityManager = new class implements EntityManagerInterface
{
    public ?EntityManagerInterface $inner = null; 
    public function flush() { $this->inner->flush(); }
}

Marco: by upgrading symfony, you get lazy EntityManager by default, because they need
it to reset the service (background workers).
Alan: https://github.com/mautic/mautic/blob/340f3440c23fbd48f34fc26b35e45170ebdfcc87/app/bundles/UserBundle/Config/config.php#L364-L371
Marco: that already breaks laziness, but we can mark the repository lazy. Make mautic.user.repository.user_token
lazy perhaps.
Marco: if you put laziness in hot paths, it won't lead to anything.
Sikandar: laziness will move initialization time into the runtime. Bootstrap not such a big issue, so we
need to be selective.
Marco: we need more information about a performance profile.
Sikandar: problem is not really at application-side (memory/cpu/latency).
Alan: clearly not a major concern. It may help in background processing.
Marco: are the background processes spawned once per task, or kept alive?
Alan: goes back to multi-tenancy.
Marco: maybe we can reboot individual services (EntityManager), worked fine for some integration test suite
in the past.
Marco: https://symfony.com/doc/current/reference/dic_tags.html#kernel-reset
Marco, Sikandar: only about stateful services
Sikandar: are connections pooled?
Marco: no, and resetting services would probably also reset a connection pool, if we had one
Alan: we don't have connection pooling
Marco: XDebug profiler output (cachegrind.*.out file)
Marco: problem probably not here

Alan: trouble with the paginator

Marco: paginator - as soon as you have issues, move away from it
Marco: tells you "how much", "give me a page"
Marco: explaining pagination abstraction - it's high level, work with every page
Marco: move to split methods if you can, write custom SQL/DQL if you have performance problems
Alan: explaining that InnoDB is slow at counting
Marco: pagination works like this

Code Block
languagedql
SELECT a, b
FROM MyUsers a 
JOIN a.posts b
Code Block
languagedql
SELECT COUNT(DISTINCT a)
FROM MyUsers a 
JOIN a.posts b
Code Block
languagedql
SELECT DISTINCT a.id
FROM MyUsers a 
JOIN a.posts b
Code Block
languagedql
SELECT a, b
FROM MyUsers a 
JOIN a.posts b
WHERE a.id IN (:ids)

Broken query: assume 2 user with 1000 posts each.
The following query will give you 1 user with 100 posts hydrated: wrong result, and wrong in-memory too.

Code Block
languagedql
SELECT a, b
FROM MyUsers a 
JOIN a.posts b
LIMIT 100

Simpler query does not need paginator:

Code Block
languagedql
SELECT a, p
FROM MyUsers a
LEFT JOIN a.profile p # this is a *-to-one association

Jan: problem with large numeric offsets - offset seems to become problematic
Marco: could force it to make a range query by using identifiers (find first identifier after X)
Jan: https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/
Jan: asking about a tool/library that implements this
Marco: IMO avoid more tools here, write SQL. Explaining OLTP (OnLine Transaction Processing) vs reporting
Marco: suggesting to do more SQL
Alan: not afraid of writing more SQL
Marco: avoid SQL generators, write SQL by hand, avoid magic to avoid also unpredictable performance
Alan: segmentation is the biggest issue

Schema change -> migration to other stores

Marco: suggesting using different schema for transactional and reporting data.
Sikandar: use a new data store (column storage) for this, but it's in pipeline and won't happen soon.
Alan: that's also the problem - Doctrine kinda forced us to stick to MySQL
Marco: explaining simple example of ES repository:

Code Block
languagephp
<?php

final class ContactInformationRepository
{
    public function get(ContactId $id): Contact
    {
        $events = $this->connection->query('SELECT * FROM EVENTS .... WHERE ...');

        $contact = Contact::bare();

        foreach ($events as $e) {
            $contact->applyEvent($e);
        }

        return $contact;
    }
}

Sikandar: what about an entity that has a column with JSON?
Marco: doesn't need to be a repository
Alan: we're looking at a way to get a single source of truth (event-sourcing potentially), and it's managed by the API.
Alan: then we have queries to perform, like segmentation, like "who has visited X in the last Y days"
Alan: we could store in unstructured JSON table, and allow searching
Alan: it's possible to index JSON columns now - https://stackoverflow.com/a/61040738
Marco: suggesting splitting two different schemas for reading/writing again
Sikandar: we attempted using replication (1:1 schema too)
Marco: referring to CQRS, avoid it until really necessary
Marco: start with query objects

Code Block
languagephp
<?php

final class GetCountOfContactsInState
{
    public function __invoke(ContactState $state): int
    {
        // ...
    }
}

Queries can then be made swappable (domain has definition, infrastructure has implementation):

Code Block
languagephp
<?php

namespace Mautic\SomeComonent\Infrastructure;

final class GetCountOfContactsInSegment implements \Mautic\SomeComponent\Domain\ContactsInSegment
{
    public function __invoke(SegmentDefinition $segment): int
    {
        // ...
    }
}

Alan: so suggestion is to move from repositories to more granular queries
Marco: suggesting to use the ORM for storing/modifying information (OLTP), and move to query objects that perhaps
avoid the ORM overall for larger batch tasks

Next week

  • Perf profile - xdebug output

  • Managing obj relationships without enforcing FK constraints

  • ORM generated queries vs Native SQL queries performance .. will it make any difference

  • add link to Zoom call directly to calendar entry

zoom_0.mp4