Commits · ad749443b8d6343552a28bfde16445eb2a99cefe · 小白蛋 / Pixie

This project is mirrored from https://gitee.com/cowcomic/pixie.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

13 Aug, 2020 1 commit

PP-2117: Add the concept of an 'external' GRPC Sink to the compiler · ad749443

Natalie Serrino authored 4 years ago

Summary:
In the new end-to-end streaming Vizier, GRPC sinks will stream results directly to the query broker, rather than Kelvin buffering up the final results and sending it in batch to the query broker. That means there are two 'types' of GRPC sinks in that system: internal GRPC sinks which send mid-query, intermediate data to GRPC sources on another Carnot instance, and external GRPC sinks which send complete results to the query broker or another external address. In the internal GRPC Sink case, the node only needs to know the destination ID of the GRPC Source node that it's sending the data to. In the external GRPC Sink case, the node needs to know the name and schema of the output table.

In this diff, the concept of the external GRPC sink is introduced. Changes to rules are made so that things like automatically adding a limit to memory sinks will also apply to these external GRPC sinks. Next diff, the compiler will change so that px.display automatically results in these external GRPC sinks rather than memory sinks.

Test Plan: added / existing

Reviewers: philkuz, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-2117

Differential Revision: https://phab.corp.pixielabs.ai/D5966

GitOrigin-RevId: 472d7f33454b3b1df3ee5183e80cf62966de1d6f

ad749443

04 Aug, 2020 1 commit

PP-2100: Make metadata ctx keys such as pod_id work in compiler by removing outdated _attr logic. · 6d607aa7

Natalie Serrino authored 4 years ago

Summary:
Previously, after the metadata refactor, only upid worked as a ctx key to access properties such as pod_name.
However, other keys such as pod_id in the network_stats table should also be able to produce values such as pod_name.
This diff cleans up the remnants of the _attr-based logic and treats upid and other keys such as pod id consistently.

Test Plan: ran a query to do df.ctx['pod'] on network_stats table which didn't work before, existing unit tests.

Reviewers: philkuz, jamesbartlett, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-2100

Differential Revision: https://phab.corp.pixielabs.ai/D5869

GitOrigin-RevId: 2529c2ff32be4e80d3c022bf5cb1099ca8911356

6d607aa7

28 Jul, 2020 1 commit

[PP-2059] Bypass Topological sort in rules and other prts of the compiler · 22b127d1

Phillip Kuznetsov authored 4 years ago

Summary:
Topological sort was a huge contributor of execution time in the compiler, but we didn't need to use it in a lot of places. This diff reduces the compilers dependence on topo sort, especially in the rules defintiions.

Most notably, added an argument ot the rules constructor so that Rule::Execute() can take the nodes in any order rather than TopologicalSorting before.

Test Plan: Updated tests to match the changes, had to refactor some that assumed an order of nodes.

Reviewers: nserrino, jamesbartlett, #engineering

Reviewed By: jamesbartlett, #engineering

JIRA Issues: PP-2059

Differential Revision: https://phab.corp.pixielabs.ai/D5731

GitOrigin-RevId: d30902087c772ecc86682834032a1e8f62f840bb

22b127d1

14 Jul, 2020 1 commit

Allow specifying start_time, and end_time as absolute times. · e9d8cb76

James Bartlett authored 4 years ago

Summary: TSIA

Test Plan: Added a test, tried on staging.

Reviewers: philkuz, #engineering

Reviewed By: philkuz, #engineering

Subscribers: zasgar

Differential Revision: https://phab.corp.pixielabs.ai/D5581

GitOrigin-RevId: 12912b38ef775913dc9cca9f121b78cf814fcaef

e9d8cb76

23 Jun, 2020 1 commit

Use ST_NONE instead of ST_UNSPECIFIED as default. · 82c5d325

James Bartlett authored 5 years ago

Summary: @philkuz pointed out that the intention of ST_UNSPECIFIED was to catch bugs where the semantic type wasn't set properly. This diff changes the system to treat ST_NONE as the default semantic type and catch all for inference rules. This surfaced a bug where the Metadata service wasn't correctly propagating Semantic type information from stirling (even though this information doesn't exist in stirling yet, it wasn't correctly propagating the ST_NONEs).

Test Plan: Checked that all output columns in the UI now have sem type ST_NONE and not ST_UNSPECIFIED.

Reviewers: philkuz, nserrino, #engineering

Reviewed By: nserrino, #engineering

Subscribers: philkuz

Differential Revision: https://phab.corp.pixielabs.ai/D5308

GitOrigin-RevId: 691b725a74819c1b9a7097bae208ba19fd12116a

82c5d325

10 Jun, 2020 1 commit

SemTypes Pt. 4: Add ResolveTypesRule to Analyzer. · 7a3aecd7

James Bartlett authored 5 years ago

Summary: Adds analyzer rule that uses ResolveType machinery from previous diff to set the resolved type for each operator.

Test Plan: Added a test for the rule, but most of the testing is in the previous diff.

Reviewers: #engineering, nserrino

Reviewed By: #engineering, nserrino

Differential Revision: https://phab.corp.pixielabs.ai/D5221

GitOrigin-RevId: cb8106bb168c09548f527e831bf06271bdf0a73e

7a3aecd7

09 Jun, 2020 1 commit

SemTypes Pt. 3: Add ResolveType trait to Operators and Expressions. · c5de8bb5

James Bartlett authored 5 years ago

Summary: Adds the machinery to be able to resolve types of different IR nodes. The next diff will add an analyzer rule to do that.

Test Plan: Added type_resolution tests.

Reviewers: philkuz, nserrino, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D5220

GitOrigin-RevId: 1fcbde270ea6026c8b169a5972711bd1ca16b03a

c5de8bb5

22 May, 2020 1 commit

[Cleanup] Rename fields and misc documentation in ir_nodes · d354f70f

Phillip Kuznetsov authored 5 years ago

Summary: Wanted to do some of this for a while, just paying my dues

Test Plan: teseted and passes so far

Reviewers: nserrino, jamesbartlett, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D4944

GitOrigin-RevId: e6cfac38d5780331da94704c1f433aadb507d0b6

d354f70f

21 May, 2020 1 commit

[PP1940] Fix issue where filter pushed down even when parent operator had multiple children · 29ca78f8

Phillip Kuznetsov authored 5 years ago

Summary: Filter push down fix that included added an extra stopping condition for push down ( if parent.children().size() > 1 ) and moving the rule into the optimizer

Test Plan: tested in optimizer and seems to work, previously failing query works as well

Reviewers: nserrino, jamesbartlett, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D4914

GitOrigin-RevId: fa2f9fc4bb02ead7cf8b50831710aa261fb01d2e

29ca78f8

19 May, 2020 2 commits

PP-1928 remove old metadata conversion classes · 7253ab63

Natalie Serrino authored 5 years ago

Summary: TSIA, depends on D4882.

Test Plan: existing should pass

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-1928

Differential Revision: https://phab.corp.pixielabs.ai/D4883

GitOrigin-RevId: 20d7ae02072b0604d3d592e1a363140a10693d14

7253ab63

PP-1928, PP-1916: Update analyzer to use new metadata handling · 0b24c59d

Natalie Serrino authored 5 years ago

Summary: Depends on D4877. This diff completes the refactor of how metadata is handled in the compiler. It is converted to a metadata generating function which has annotations attached to it labeling what kind of metadata type it is. These annotations are passed down when the output of that func is assigned to another column name or used in a group by clause or filter. These annotations are the bases for agent metadata filtering. This diff removes the old behavior where metadata columns were created via a map and called _attr_<metadata_name>. Next diff I will delete the obsolete classes.

Test Plan: added/existing

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-1928, PP-1916

Differential Revision: https://phab.corp.pixielabs.ai/D4882

GitOrigin-RevId: 0ea09742110cbd31e5ee86785584f67535eccc6e

0b24c59d

18 May, 2020 1 commit

PP-1928 pt 2: Add two new rules to refactor existing approach to metadata in the compiler. · f41bb865

Natalie Serrino authored 5 years ago

Summary:
Depends on D4868. This is part of a series of changes to make the way the compiler handles metadata work a bit more easily with agent metadata pruning.
The new flow will remove the _attr columns created by maps, and MetadataIRs will now be considered intermediate nodes that are compiled to a func that generates them.

New analyzer flow:
1. MetadataIR has its MetadataProperty set in ResolveMetadataPropertyRule
2. MetadataIR is converted to the func that creates it (upid_to_pod_name(upid) for example) in ConvertMetadataRule. ConvertMetadataRule also sets the metadata_type annotation on the output func.
3. PropagateExpressionAnnotationsRule (D4868) will propagate the metadata_type annotation in the generated func to all of the places that generated func is renamed or set to
a column. for example, if that func is used to produce a column, which is then renamed and used as a group by column, the annotations will follow it as long as the column is intact.

Then, consumers of the IR that need to know about metadata type of an expression (such as metadata agent pruning) only have to look at the metadata type annotation. This avoids the problem
of those consumers needing to check for the various cases that exist today.

Test Plan: added

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-1928

Differential Revision: https://phab.corp.pixielabs.ai/D4877

GitOrigin-RevId: 67e5ca2faa13e816b39f18d668e9cbd1766d893a

f41bb865

16 May, 2020 1 commit

PP-1928: Add annotations struct to ExpressionIR and rule to propagate... · de3b8060

Natalie Serrino authored 5 years ago

PP-1928: Add annotations struct to ExpressionIR and rule to propagate annotations between operators.

Summary:
We are moving toward a new model for handling metadata about expressions. This is the first diff in a sequence that will replace the existing handling of metadata in our compiler. We want to track annotations such as metadata type on ExpressionIRs. That way, it is easy for a given consumer of an ExpressionIR to know if that ExpressionIR represents a metadata field, even if it originated from a column that has been processed and renamed since assignment. This annotations concept is scalable to other types of annotations in the future, but currently is limited to metadata for now. (The major use case right now is generalizing agent pruning).

If a column has annotations, and then is reassigned or processed by a downstream operator, the output column should potentially share the same annotations depending on the context. If it is a simple name reassignment in a map, then the annotations from the input column should be copied over. There are some more complex cases with things like union (where all of the input columns need to agree on a particular annotation for it to go in the output). As a result, this diff adds a rule that computes downstream annotations for columns derived from other columns or expressions that have annotations associated with them.

Next up will be to tie the rule into the analyzer, and to metadata type annotation set for metadata fields produced by expressions such as df.ctx['pod_id']. Then, some of the existing metadata logic/classes will be removed.

Test Plan: added

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-1928

Differential Revision: https://phab.corp.pixielabs.ai/D4868

GitOrigin-RevId: 49f425b6e6ada2e2b3f7ec1a5c9ff4936d91bc6f

de3b8060

15 May, 2020 1 commit

Remove MetadataLiteralIR and MetadataFormatRule · 93c505d4

Natalie Serrino authored 5 years ago

Summary:
These were added based off of a prior design for the way that pruning agent plans based on pod, service, etc filtering would work.
As a result they are obsolete at least for now.

Test Plan: n/a

Reviewers: philkuz, #engineering

Reviewed By: philkuz, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D4848

GitOrigin-RevId: a223eb5249c65be2dbd40249d988a53b29b8b497

93c505d4

13 May, 2020 1 commit

PP-1811: Add aggs to filter pushdown, hook filter pushdown into analyzer · c069fee9

Natalie Serrino authored 5 years ago

Summary: TSIA. This code still doesn't support joins or unions for the pushdown, that will come in a subsequent diff.

Test Plan: existing/added

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-1811

Differential Revision: https://phab.corp.pixielabs.ai/D4762

GitOrigin-RevId: 8d58c056fc49bc62f48ba7cc71a606e1ac060087

c069fee9

08 May, 2020 1 commit

PP-1811: Initial implementation of filter pushdown · 9a64a866

Natalie Serrino authored 5 years ago

Summary:
We want to move filters as early in the query as possible for effiency reasons.
This diff implements a rule to do that, which only works on certain kinds of operators for now.
This diff has a small refactor a utility in ir_nodes.h so that it could be used in rules.cc as well.
Next up will be supporting pushing filters past aggs, joins, and unions.

Test Plan: added

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PP-1811

Differential Revision: https://phab.corp.pixielabs.ai/D4665

GitOrigin-RevId: 4e829528592397c3ea83ff998d32d20c00869fbb

9a64a866

22 Apr, 2020 1 commit

Add EXPECT_MATCH macro for planner code · 66a98807

Phillip Kuznetsov authored 5 years ago

Summary: EXPECT_MATCH is something I've wnated to do, but never did. It's a nice replacment for EXPECT_TRUE(Match(...)) because it's shorter and provides a better error message.

Test Plan: Tested in follow up diff

Reviewers: nserrino, zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D4468

GitOrigin-RevId: 4062875b21587b7b91f73ad0df137203212418e9

66a98807

22 Mar, 2020 1 commit

[PL-1638] (Compiler) Change Distributed splitter to rely on Independent Graphs... · 7d60e159

Phillip Kuznetsov authored 5 years ago

[PL-1638] (Compiler) Change Distributed splitter to rely on Independent Graphs at the Operator Level

Summary:
Depends on D4004.  Planner splits the logical plan into the data source side (~= pems) and processor side(= kelvin) using the Dag->IndependentGraphs. If we reuse a non-op variable on both sides of this split, then the IndependentGraphs algorithm will return the entire graph as one entity which is incorrect.
Instead had to make a new Independent Graphs algorithm that only looks at Operators as the actual graph.

Test Plan: old queries work, problematic query added as a new test in logical planner

Reviewers: nserrino, jamesbartlett, #engineering

Reviewed By: nserrino, #engineering

JIRA Issues: PL-1638

Differential Revision: https://phab.corp.pixielabs.ai/D4024

GitOrigin-RevId: b2b3a9955ec17b3d8bd52434fde91b6b754b747d

7d60e159

20 Feb, 2020 1 commit

PL-1497 part 2: Update filter column pruning logic to use column selection in output to fix bug · 417f46b5

Natalie Serrino authored 5 years ago

Summary:
We had a problem with filters where if you had two filters in a row, where both filters used columns for the filter condition
that were not used downstream anywhere, you would end up with misaligned output relations. We don't need to output columns that are
only used for evaluating a filter condition, so this diff updates the logic to take advantage of column selection in the filter node.

Test Plan: added

Reviewers: philkuz, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1497

Differential Revision: https://phab.corp.pixielabs.ai/D3551

GitOrigin-RevId: 070a55f15d43b39c3b59df403b47f6095ab590b3

417f46b5

17 Feb, 2020 1 commit

[PL-1474] Rename namespaces for the compiler -> planner refactor. · e0867fec

Phillip Kuznetsov authored 5 years ago

Summary: Depends on D3526. Final diff in the refactor -> rename namespaces. This logically sorts everything in the planner.

Test Plan: pass tests from before, hope nothing breaks.

Reviewers: nserrino, zasgar, michelle, #engineering, jamesbartlett

Reviewed By: jamesbartlett

Differential Revision: https://phab.corp.pixielabs.ai/D3528

GitOrigin-RevId: 4a863b5fe2fe13a397f95e458bdcc4a2d074657c

e0867fec

16 Feb, 2020 3 commits

[PL-1473] Rename compiler to planner and move around files · 9c075b6b

Phillip Kuznetsov authored 5 years ago

Summary: Depends on D3525. Follow up diff will have a rename of the namespaces.

Test Plan: tests pass, just a refactor

Reviewers: #engineering, zasgar, jamesbartlett, nserrino

Reviewed By: jamesbartlett

JIRA Issues: PL-1473

Differential Revision: https://phab.corp.pixielabs.ai/D3526

GitOrigin-RevId: 610a06247db629d186b945ed520e20a22fc2b139

9c075b6b

[PL-1472] make a dedicated rules and metadata dir in compiler · 1793eac7

Phillip Kuznetsov authored 5 years ago

Summary: Depends on D3524. TSIA

Test Plan: everything tests and works.

Reviewers: nserrino, zasgar, jamesbartlett, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3525

GitOrigin-RevId: bde2e5053ed366540dee8768cde31d8990b118f7

1793eac7

[PL-1471] Move distributed planner to own directory · fb3343e7

Phillip Kuznetsov authored 5 years ago

Summary: Part of the cleanup of the compiler directory, moving distributed to its own directory. Eventually we'll move compiler components to its own directory as well.

Test Plan: tested to make sure everything works after making this move.

Reviewers: nserrino, zasgar, jamesbartlett, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3524

GitOrigin-RevId: 8bbbda506c4bb14918742a23489495e53502888c

fb3343e7

13 Feb, 2020 1 commit

PL-1452: Add rule to limit output rows per memory sink where appropriate · aecd7148

Natalie Serrino authored 5 years ago

Summary:
Depends on D3495. This rule adds a limit above memory sinks when the node above it isn't a limit.
If the node above it is a limit, it edits the limit where appropriate.
Next will be to hook this into the analyzer.

Test Plan: added tests

Reviewers: philkuz, jamesbartlett, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1452

Differential Revision: https://phab.corp.pixielabs.ai/D3496

GitOrigin-RevId: 8416c8cd9037895f877d22a4a3abeaa5ac4ec8dd

aecd7148

11 Feb, 2020 1 commit

PL-754: Adding support for Rolling windowing in compiler and query language. · 1fc39e0f

James Bartlett authored 5 years ago

Summary:
Adds a .rolling function to support windowed aggregates, eg.
```
t1 = px.DataFrame(..., select=['time_', 'col1'])
t1 = t1.rolling('2s').agg(...)
```
The rolling function also supports an `on` parameter to specify which column to window on, however it currently only supports windowing on `time_`. `t1.rolling('2s')` is equivalent to `t1.rolling('2s', on='time_')` and `t1.rolling(10, on='col1')` is currently not supported but will be in the future.

Currently, the proto spec is undefined so attempting to compile to a proto will result in an Unimplemented error.

This PR also adds a GroupAcceptorIR that serves as an interface for any IR that can accept groups from a GroupBy op. This PR changes BlockingAgg to subclass GroupAcceptorIR and also the new RollingIR is also a subclass of GroupAcceptorIR. This allows for both `t1.groupby(...).rolling(...).agg(...)` and `t1.groupby(...).agg(...)` to be handled by the Merge rule.

Additionally, this PR adds support for the RollingIR window size parameter to accept time strings as above, or compile time expr evaluation such as `t1.rolling(1 + px.now())`.

Currently, the RollingIR is left in the graph, but once the spec for the proto is known it will likely need to be merged into the Agg.

Test Plan:
- Add tests to ensure new `MergeGroupByIntoGroupAcceptorRule` works for both `RollingIR` and `BlockingAggIR`.
- Add tests to ensure transition from `ConvertMemSourceStringTimesRule` to `ConvertStringTimeRule` still works for mem source and additionally works for Rolling now.
- Add tests that RollingIR node gets created properly.
- Add tests to check that compile time expr eval works for new Rolling op.

Reviewers: #engineering, philkuz, nserrino

Reviewed By: #engineering, nserrino

Subscribers: nserrino, philkuz

JIRA Issues: PL-754

Differential Revision: https://phab.corp.pixielabs.ai/D3365

GitOrigin-RevId: c59cfb51bb9f11d444ee6551c6092ed17710aa22

1fc39e0f

06 Feb, 2020 1 commit

PL-1189: remove px from UDF names · fc24d413

James Bartlett authored 5 years ago

Test Plan: Tests pass.

Reviewers: #engineering, philkuz, nserrino, zasgar

Reviewed By: #engineering, zasgar

Differential Revision: https://phab.corp.pixielabs.ai/D3440

GitOrigin-RevId: 55df3886123156ab3da4216225fcaed3c12d454c

fc24d413

30 Jan, 2020 1 commit

PL-1336: Add compiler rule to remove operators that aren't connected to a MemorySink · 75d58eea

James Bartlett authored 5 years ago

Summary:
Adds a compiler rule that determines which operators are connected to a MemorySink, and removes those operators.

This rule is run before the rule that prunes unused columns.

Test Plan: Added a compiler level test for this behaviour, as well as two rule level tests.

Reviewers: #engineering, nserrino, philkuz

Reviewed By: #engineering, philkuz

JIRA Issues: PL-1336

Differential Revision: https://phab.corp.pixielabs.ai/D3299

GitOrigin-RevId: 0cf40938fb3c6d8e104a8de9e6cecc27f557f72d

75d58eea

27 Jan, 2020 1 commit

PL-1349: Enforce that all unit tested rules clean up their stray IRNodes · efb409e0

Natalie Serrino authored 5 years ago

Summary: This adds an automatic check on rules_test to ensure that each of the applied rules cleans up after itself. This also comes along with some cleanup of the way ir_nodes handle certain method calls. depends on D3302.

Test Plan: existing

Reviewers: #engineering, philkuz, jamesbartlett

Reviewed By: #engineering, philkuz

JIRA Issues: PL-1349

Differential Revision: https://phab.corp.pixielabs.ai/D3303

GitOrigin-RevId: 6aab4d98788b52a18eb4f1c35764a554cb783b5c

efb409e0

24 Jan, 2020 2 commits

PL-1349: IRNodes should clean up their stray nodes · 9619250d

Natalie Serrino authored 5 years ago

Summary:
Add a utility for nodes to remove their prior children if and only if those children have no other parents.
Have rules replace DeferNodeDeletion with DeleteNode, and support skipping deleted nodes in rule execution.

Test Plan: existing

Reviewers: philkuz, jamesbartlett, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1349

Differential Revision: https://phab.corp.pixielabs.ai/D3292

GitOrigin-RevId: 371fa81264dece4f74c6d7a43e546c92e7671c75

9619250d

Misc compiler cleanup · e3b82df0

Natalie Serrino authored 5 years ago

Summary: DeleteNodeAndChildren is not used and it is also not safe to nodes with multiple parents (ExpressionIR with multiple parents got introduced after it was written). Also DeleteNode was leaving the nodes in id_to_nodes_map which led to inconsistent results between the IR dag and IR node map (each of which can be checked depending on the situation).

Test Plan: edited a test to work with these fixes

Reviewers: philkuz, jamesbartlett, #engineering

Reviewed By: philkuz, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3283

GitOrigin-RevId: 24290054ab30dc4454a300562a65eae65428d6b2

e3b82df0

25 Jan, 2020 1 commit

Fix bug where filter columns get pruned out when they aren't required in the filter output relation · 0b827dca

Natalie Serrino authored 5 years ago

Summary:
Filter uses the same output relation as the input relation. As a result, if we had a plan like this:

src -> filter -> map

where map produced columns col1, col2, but filter operated on col0.

The way it was previously implemented, col0 would be pruned out by our new rule, thus breaking the filter condition.

What we need to do is have filter output col0, col1, col2 so it can still use col0, and then have col0 get pruned by
whatever downstream node doesn't need it. This diff does that.

Test Plan: added

Reviewers: philkuz, jamesbartlett, #engineering

Reviewed By: philkuz, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3290

GitOrigin-RevId: 5fe09fa45eafdcc52d8cafc2af3567ea51cd5ebc

0b827dca

24 Jan, 2020 1 commit

Rename pl to px in the compiler · 3056735a

Phillip Kuznetsov authored 5 years ago

Summary: PL is old new, px is the new news. Get out and use px.

Test Plan: all tests pass with the new changes, no new functionality.

Reviewers: zasgar, nserrino, michelle, jamesbartlett, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3273

GitOrigin-RevId: 970e66dcc75ecbf455319819f348d1b9daa93e0d

3056735a

23 Jan, 2020 1 commit

PL-1349: Add rule to clean up stray IRNodes that are not connected to an operator. · 10c4a5ad

Natalie Serrino authored 5 years ago

Summary:
We don't want to have random stray nodes that are lying around and not used by the plan.
We can consider adding it as a DCHECK to other rule batches or the ast_visitor code if we don't want to rely on it for pruning and instead expect the other rules to take care of themselves. In that case we would run the rule and DCHECK that it didn't do any work on the graph, and if it did we would know that we were leaving stray nodes.

Test Plan: added

Reviewers: philkuz, zasgar, jamesbartlett, #engineering

Reviewed By: philkuz, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3271

GitOrigin-RevId: 83f0db4325f0aa6228e9190af9de86c526ec814f

10c4a5ad

21 Jan, 2020 2 commits

PL-1319: Add rule to prune unused output columns. · 4b8a02e7

Natalie Serrino authored 5 years ago

Summary: Depends on D3224, D3230, D3231.

Test Plan: added tests for various cases

Reviewers: philkuz, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1319

Differential Revision: https://phab.corp.pixielabs.ai/D3236

GitOrigin-RevId: 9a5ec28ff446a926b7bfc74c6190f5adb0b5ced4

4b8a02e7

(part of) PL-1197 via PL-1319: Make sure sink output relation considers specified output columns · 6e6bafa7

Natalie Serrino authored 5 years ago

Summary: Currently the output relation of a sink is just taken from the parent, however sinks do have a field called out_columns_ which is meant to support selecting a subset of output columns. PL-1319 uncovered these two cases in some of its logic, so here is part of the work in solving PL-1197 which supports specifying output columns in pl.display in the QL.

Test Plan: added

Reviewers: philkuz, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1197, PL-1319

Differential Revision: https://phab.corp.pixielabs.ai/D3241

GitOrigin-RevId: 44251ed7cff6b191a96eb0c82b7ce6a58b04ad62

6e6bafa7

17 Jan, 2020 1 commit

PL-1319: Refactor UnionIRs in preparation for column pruning rule · 78ad8677

Natalie Serrino authored 5 years ago

Summary:
PL-1319 will add in a compiler optimization that prunes columns that are unnecessary to the script output.
In order to do this nicely, it makes sense for the resolution of a column to its ultimate index in the plan to happen as late in the game as possible, as the input relation to a given operator may shift as columns are pruned.
UnionIR previously stored column indexes to refer to columns, rather than a ColumnIR.
This is error prone given the upcoming optimization, because those indexes could become stale. We want all IRNodes to use ColumnIR types when referring to columns so that it's easier for us to figure out what is able to be pruned.
As a result, I moved UnionIR to using ColumnIR instead of indexes.
This exposed an issue where we were not resolving column indexes for columns/operators added after the analyzer phase.
As a result, a step in the distributed analyzer was added to resolve column indexes, so that nodes that were added to the plan after the initial analyzer phase still have their columns resolved.

Depends on D3171 and D1319.

Test Plan: added tests

Reviewers: philkuz, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1317, PL-1319

Differential Revision: https://phab.corp.pixielabs.ai/D3202

GitOrigin-RevId: 4b6c3ef40d3a44b1bcd32d0500682d5ff2f6fdf5

78ad8677

15 Jan, 2020 2 commits

PL-1319 part 1: refactor ResolveColumn into ResolveColumnType and ResolveColumnIndex · 30aae58c

Natalie Serrino authored 5 years ago

Summary: This refactor is in service of PL-1319, which will add an analyzer phase that prunes unused columns from the plan. In order to do that, we want to defer setting the column index as long as possible, because the input/output relation of each operator may change when its columns are pruned. During the analyzer phase, we want to entirely deal in terms of column names, rather than mixing both, only moving over to column indexes at the end. Right now, column type and column index were being set at the same point, but column type is needed for many phases of the analyzer, and column index is only needed at the very end when ToProto is called on the operators. Therefore, these get set separately and I added a rule where the column index is resolved at the final step of the analyzer.

Test Plan: existing should pass

Reviewers: philkuz, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1319

Differential Revision: https://phab.corp.pixielabs.ai/D3171

GitOrigin-RevId: 11bcb7fe9f663a91097a2c153e78fef555d53eac

30aae58c

PL-1319: Refactor Rule into Rule and DistributedRule · b57a192f

Natalie Serrino authored 5 years ago

Summary:
PL-1319 as well as PL-1317 will require the addition of a distributed analyzer, which will share many characteristics with the current analyzer. We want to be able to have the same logic that exists for rules on IRs be able to execute on DistributedPlans as well.

Analyzer:
traverses IR, applies Rules to IRNodes

DistributedAnalyzer:
traverses DistributedPlan, applies DistributedRules to CarnotInstances.

We want the rule executor and the graph walking stuff, as well as the patterns for Apply() on Rules to be shared on both these cases.
The first use case for a DistributedAnalyzer is to have IR rules executed on each of the Plans for each CarnotInstance, so support for
that kind of rule is added in this diff.
However there will be other kinds of DistributedRules that actually modify the top-level DAG of DistributedPlan in the future. For example,
an optimization that removes an entire CarnotInstance from the DistributedPlan because a filter condition causes there to be no data present
on a given node that will be included on the output.

Next step will be to edit RuleExecutor to be generalized to be able to use Rules or DistributedRules.
After that, the DistributedAnalyzer will be added to the codebase.
Then, a rule which needs to be added for unions post-distributed splitting will be added to the DistributedAnalyzer.
Eventually, much of the distributed splitting/stiching logic can be re-articulated as part of the DistributedAnalyzer.

Test Plan: existing, added

Reviewers: philkuz, zasgar, #engineering

Reviewed By: philkuz, #engineering

JIRA Issues: PL-1317, PL-1319

Differential Revision: https://phab.corp.pixielabs.ai/D3191

GitOrigin-RevId: 6e42870fbb51c0454a6248c078c1011b71ae05a9

b57a192f

11 Jan, 2020 1 commit

Remove unused function · 28a3df91

Zain Asgar authored 5 years ago

Summary: Not used except in test, probably left over.

Test Plan: bazel test //...

Reviewers: michelle, nserrino, philkuz, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3142

GitOrigin-RevId: 9ef59aff6a75b30e97ae059a417fdb066698e28e

28a3df91

09 Jan, 2020 1 commit

Revert the changes to the RegistryInfo pointer in compiler state because I... · d4cc565c

Phillip Kuznetsov authored 5 years ago

Revert the changes to the RegistryInfo pointer in compiler state because I misjudged the lifetime of that object.

Summary: I misjudged the lifetime of the RegistryInfo object. It turns out that it should last longer than the compiler state object, something that will be necessary with upcoming changes to the LogicalPlanner object.

Test Plan: tests pass with changes, this is a nearly 1 to 1 reversion of teh changes in a prev commit.

Reviewers: zasgar, nserrino, michelle, #engineering

Reviewed By: michelle, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D3111

GitOrigin-RevId: e2e0666d7da34d9eea44e3c358eb3f3164b7251e

d4cc565c