2021-08-31 ORM Hydration and Relationships
Perf profile - xdebug output
there's a
cachegrind.out.134
try out running with--env=prod
tryopcache.validate_timestamps=0
- https://www.php.net/manual/en/opcache.configuration.php#ini.opcache.validate-timestamps
tune opcache: Fine-Tune Your Opcache Configuration to Avoid Caching Surprises
trycomposer install --classmap-authoritative
potentially interesting to use opcache preloading (PHP 7.4)Mautic\CoreBundle\Templating\TemplateReference->getPath
being called a lot of times - to investigate why
Alan: describing fallback system for templates
Marco: perhaps template system can be replaced with a compiler pass that generates a map of templates upfront
Sikandar: asking about whether this affects scalability
Marco: reducing outliers highlights more issue areas like PDO queries, I/O
Marco: maybe we should generate 1M+ leads
Marco perhapsAbstractHydrator
really is the perf bottleneck
Marco: explained hydrator overhead via Doctrine ORM Hydration Performance Optimization
Sikandar: we can just use a DTO named ctor / no reflection and use native SQL
final class LeadForSomeSpecificUseCaseDTO {
public $id;
public $name;
public function fromRow(array $resultSetRow): self { ... }
}
class GimmeContactsForMyUseCase
{
/** @return iterable<int, LeadForSomeSpecificUseCaseDTO> */
public function __invoke(...): iterable {
...
}
}
Sikandar: can also use the SQL query and spit directly to frontend
Marco: type safety to be considered
Managing obj relationships without enforcing FK constraints
Sikandar: when we analyze queries from Zabbix/PMM, we see a lot of queries waiting for lock,
10%~30%. System is very write-intensive, and referential integrity is enforced.
Reads and writes are probably contending with each other.
Sikandar: we're not using FKs except for cascades.
Alan: cascade operations are in place / used for deletes
Alan: deleting contacts => lots of cascade operations
Marco: is there a reproducer?
Alan: for deletes, yes, for FKs not really
Sikandar: no direct relationship between FK cascades and locks
Marco: was replication attempted?
Sikandar: replication made it worse
Marco: replication shouldn't affect it, as it operates on binlogs?
Marco: is high consistency really needed for reads? For example segments?
Sikandar: depends on use-case. For example seeing new contact.
Marco: is high consistency really a business scenario though?
Alan: replication was abnormally slow - took minutes to catch up, and would be hours behind
Alan: we shifted focus elsewhere
Alan: we don't want to create duplication/loops for information that could be processed twice, such as
broadcast email, like eligible contacts after a broadcast went out.
Alan, Marco: discussing Doctrine's MasterSlaveConnection
Sikandar: largest DB is probably 350Gb of leads
Alan: that's mostly metadata - related data around segmentation leads to large sizes
Alan: information about what was sent and such is in related tables / denormalized
Marco: 350Gb is "tiny" - why is it having so much write noise that we get locks?
Alan: creating a contact -> added to segments -> written to table that maps to segment -> log entry (immutable)
Alan: segment mapping table is kept up to date
Alan: log is an append-only data structure
Alan: campaigns also produce records for every step in a campaign, deciding which path a lead has taken
Alan: we generate records when emails go out, and when links are clicked, when users are redirected, etc.
Alan: landing page also leads to writes
Marco: are writes caused by frontend (HTTP) of background processes?
Alan: more background processes - they're real-time
Marco: is there a message queue for this?
Alan: we do, but community uses cronjobs
Marco: perhaps use messenger component to abstract this, and send to RMQ?
Sikandar: that shifts the problem though
Marco: perhaps a good idea to avoid flushing every queue operation to disk?
Marco/Sikandar: discussion about NoSQL use-case
Marco: probably relevant if we have frequent ALTER TABLE, but probably not the issue
Marco: are the logs in FK and transactional coupling with the active lead data?
Alan: no direct biz logic, primary use-case is warehousing
Alan: data only used for visualization/API
Marco: what if we remove strong integrity on logs? Is there PII in there? Could it go to a column storage/time-series DB?
Alan: some info in logs could contain PII
Marco: perhaps we could UDP-out the logs?
Next meeting:
\Mautic\CoreBundle\Entity\CommonRepository::saveEntities
performance
ORM generated queries vs Native SQL queries performance .. will it make any difference
add link to Zoom call directly to calendar entry