This project is mirrored from https://gitee.com/cowcomic/pixie.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 13 Aug, 2020 1 commit
-
-
Natalie Serrino authored
Summary: In the new end-to-end streaming Vizier, GRPC sinks will stream results directly to the query broker, rather than Kelvin buffering up the final results and sending it in batch to the query broker. That means there are two 'types' of GRPC sinks in that system: internal GRPC sinks which send mid-query, intermediate data to GRPC sources on another Carnot instance, and external GRPC sinks which send complete results to the query broker or another external address. In the internal GRPC Sink case, the node only needs to know the destination ID of the GRPC Source node that it's sending the data to. In the external GRPC Sink case, the node needs to know the name and schema of the output table. In this diff, the concept of the external GRPC sink is introduced. Changes to rules are made so that things like automatically adding a limit to memory sinks will also apply to these external GRPC sinks. Next diff, the compiler will change so that px.display automatically results in these external GRPC sinks rather than memory sinks. Test Plan: added / existing Reviewers: philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-2117 Differential Revision: https://phab.corp.pixielabs.ai/D5966 GitOrigin-RevId: 472d7f33454b3b1df3ee5183e80cf62966de1d6f
-
- 04 Aug, 2020 1 commit
-
-
Natalie Serrino authored
Summary: Previously, after the metadata refactor, only upid worked as a ctx key to access properties such as pod_name. However, other keys such as pod_id in the network_stats table should also be able to produce values such as pod_name. This diff cleans up the remnants of the _attr-based logic and treats upid and other keys such as pod id consistently. Test Plan: ran a query to do df.ctx['pod'] on network_stats table which didn't work before, existing unit tests. Reviewers: philkuz, jamesbartlett, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-2100 Differential Revision: https://phab.corp.pixielabs.ai/D5869 GitOrigin-RevId: 2529c2ff32be4e80d3c022bf5cb1099ca8911356
-
- 28 Jul, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: Topological sort was a huge contributor of execution time in the compiler, but we didn't need to use it in a lot of places. This diff reduces the compilers dependence on topo sort, especially in the rules defintiions. Most notably, added an argument ot the rules constructor so that Rule::Execute() can take the nodes in any order rather than TopologicalSorting before. Test Plan: Updated tests to match the changes, had to refactor some that assumed an order of nodes. Reviewers: nserrino, jamesbartlett, #engineering Reviewed By: jamesbartlett, #engineering JIRA Issues: PP-2059 Differential Revision: https://phab.corp.pixielabs.ai/D5731 GitOrigin-RevId: d30902087c772ecc86682834032a1e8f62f840bb
-
- 14 Jul, 2020 1 commit
-
-
James Bartlett authored
Summary: TSIA Test Plan: Added a test, tried on staging. Reviewers: philkuz, #engineering Reviewed By: philkuz, #engineering Subscribers: zasgar Differential Revision: https://phab.corp.pixielabs.ai/D5581 GitOrigin-RevId: 12912b38ef775913dc9cca9f121b78cf814fcaef
-
- 23 Jun, 2020 1 commit
-
-
James Bartlett authored
Summary: @philkuz pointed out that the intention of ST_UNSPECIFIED was to catch bugs where the semantic type wasn't set properly. This diff changes the system to treat ST_NONE as the default semantic type and catch all for inference rules. This surfaced a bug where the Metadata service wasn't correctly propagating Semantic type information from stirling (even though this information doesn't exist in stirling yet, it wasn't correctly propagating the ST_NONEs). Test Plan: Checked that all output columns in the UI now have sem type ST_NONE and not ST_UNSPECIFIED. Reviewers: philkuz, nserrino, #engineering Reviewed By: nserrino, #engineering Subscribers: philkuz Differential Revision: https://phab.corp.pixielabs.ai/D5308 GitOrigin-RevId: 691b725a74819c1b9a7097bae208ba19fd12116a
-
- 10 Jun, 2020 1 commit
-
-
James Bartlett authored
Summary: Adds analyzer rule that uses ResolveType machinery from previous diff to set the resolved type for each operator. Test Plan: Added a test for the rule, but most of the testing is in the previous diff. Reviewers: #engineering, nserrino Reviewed By: #engineering, nserrino Differential Revision: https://phab.corp.pixielabs.ai/D5221 GitOrigin-RevId: cb8106bb168c09548f527e831bf06271bdf0a73e
-
- 09 Jun, 2020 1 commit
-
-
James Bartlett authored
Summary: Adds the machinery to be able to resolve types of different IR nodes. The next diff will add an analyzer rule to do that. Test Plan: Added type_resolution tests. Reviewers: philkuz, nserrino, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D5220 GitOrigin-RevId: 1fcbde270ea6026c8b169a5972711bd1ca16b03a
-
- 22 May, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: Wanted to do some of this for a while, just paying my dues Test Plan: teseted and passes so far Reviewers: nserrino, jamesbartlett, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D4944 GitOrigin-RevId: e6cfac38d5780331da94704c1f433aadb507d0b6
-
- 21 May, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: Filter push down fix that included added an extra stopping condition for push down ( if parent.children().size() > 1 ) and moving the rule into the optimizer Test Plan: tested in optimizer and seems to work, previously failing query works as well Reviewers: nserrino, jamesbartlett, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D4914 GitOrigin-RevId: fa2f9fc4bb02ead7cf8b50831710aa261fb01d2e
-
- 19 May, 2020 2 commits
-
-
Natalie Serrino authored
Summary: TSIA, depends on D4882. Test Plan: existing should pass Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-1928 Differential Revision: https://phab.corp.pixielabs.ai/D4883 GitOrigin-RevId: 20d7ae02072b0604d3d592e1a363140a10693d14
-
Natalie Serrino authored
Summary: Depends on D4877. This diff completes the refactor of how metadata is handled in the compiler. It is converted to a metadata generating function which has annotations attached to it labeling what kind of metadata type it is. These annotations are passed down when the output of that func is assigned to another column name or used in a group by clause or filter. These annotations are the bases for agent metadata filtering. This diff removes the old behavior where metadata columns were created via a map and called _attr_<metadata_name>. Next diff I will delete the obsolete classes. Test Plan: added/existing Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-1928, PP-1916 Differential Revision: https://phab.corp.pixielabs.ai/D4882 GitOrigin-RevId: 0ea09742110cbd31e5ee86785584f67535eccc6e
-
- 18 May, 2020 1 commit
-
-
Natalie Serrino authored
Summary: Depends on D4868. This is part of a series of changes to make the way the compiler handles metadata work a bit more easily with agent metadata pruning. The new flow will remove the _attr columns created by maps, and MetadataIRs will now be considered intermediate nodes that are compiled to a func that generates them. New analyzer flow: 1. MetadataIR has its MetadataProperty set in ResolveMetadataPropertyRule 2. MetadataIR is converted to the func that creates it (upid_to_pod_name(upid) for example) in ConvertMetadataRule. ConvertMetadataRule also sets the metadata_type annotation on the output func. 3. PropagateExpressionAnnotationsRule (D4868) will propagate the metadata_type annotation in the generated func to all of the places that generated func is renamed or set to a column. for example, if that func is used to produce a column, which is then renamed and used as a group by column, the annotations will follow it as long as the column is intact. Then, consumers of the IR that need to know about metadata type of an expression (such as metadata agent pruning) only have to look at the metadata type annotation. This avoids the problem of those consumers needing to check for the various cases that exist today. Test Plan: added Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-1928 Differential Revision: https://phab.corp.pixielabs.ai/D4877 GitOrigin-RevId: 67e5ca2faa13e816b39f18d668e9cbd1766d893a
-
- 16 May, 2020 1 commit
-
-
Natalie Serrino authored
PP-1928: Add annotations struct to ExpressionIR and rule to propagate annotations between operators. Summary: We are moving toward a new model for handling metadata about expressions. This is the first diff in a sequence that will replace the existing handling of metadata in our compiler. We want to track annotations such as metadata type on ExpressionIRs. That way, it is easy for a given consumer of an ExpressionIR to know if that ExpressionIR represents a metadata field, even if it originated from a column that has been processed and renamed since assignment. This annotations concept is scalable to other types of annotations in the future, but currently is limited to metadata for now. (The major use case right now is generalizing agent pruning). If a column has annotations, and then is reassigned or processed by a downstream operator, the output column should potentially share the same annotations depending on the context. If it is a simple name reassignment in a map, then the annotations from the input column should be copied over. There are some more complex cases with things like union (where all of the input columns need to agree on a particular annotation for it to go in the output). As a result, this diff adds a rule that computes downstream annotations for columns derived from other columns or expressions that have annotations associated with them. Next up will be to tie the rule into the analyzer, and to metadata type annotation set for metadata fields produced by expressions such as df.ctx['pod_id']. Then, some of the existing metadata logic/classes will be removed. Test Plan: added Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-1928 Differential Revision: https://phab.corp.pixielabs.ai/D4868 GitOrigin-RevId: 49f425b6e6ada2e2b3f7ec1a5c9ff4936d91bc6f
-
- 15 May, 2020 1 commit
-
-
Natalie Serrino authored
Summary: These were added based off of a prior design for the way that pruning agent plans based on pod, service, etc filtering would work. As a result they are obsolete at least for now. Test Plan: n/a Reviewers: philkuz, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D4848 GitOrigin-RevId: a223eb5249c65be2dbd40249d988a53b29b8b497
-
- 13 May, 2020 1 commit
-
-
Natalie Serrino authored
Summary: TSIA. This code still doesn't support joins or unions for the pushdown, that will come in a subsequent diff. Test Plan: existing/added Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-1811 Differential Revision: https://phab.corp.pixielabs.ai/D4762 GitOrigin-RevId: 8d58c056fc49bc62f48ba7cc71a606e1ac060087
-
- 08 May, 2020 1 commit
-
-
Natalie Serrino authored
Summary: We want to move filters as early in the query as possible for effiency reasons. This diff implements a rule to do that, which only works on certain kinds of operators for now. This diff has a small refactor a utility in ir_nodes.h so that it could be used in rules.cc as well. Next up will be supporting pushing filters past aggs, joins, and unions. Test Plan: added Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-1811 Differential Revision: https://phab.corp.pixielabs.ai/D4665 GitOrigin-RevId: 4e829528592397c3ea83ff998d32d20c00869fbb
-
- 22 Apr, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: EXPECT_MATCH is something I've wnated to do, but never did. It's a nice replacment for EXPECT_TRUE(Match(...)) because it's shorter and provides a better error message. Test Plan: Tested in follow up diff Reviewers: nserrino, zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D4468 GitOrigin-RevId: 4062875b21587b7b91f73ad0df137203212418e9
-
- 22 Mar, 2020 1 commit
-
-
Phillip Kuznetsov authored
[PL-1638] (Compiler) Change Distributed splitter to rely on Independent Graphs at the Operator Level Summary: Depends on D4004. Planner splits the logical plan into the data source side (~= pems) and processor side(= kelvin) using the Dag->IndependentGraphs. If we reuse a non-op variable on both sides of this split, then the IndependentGraphs algorithm will return the entire graph as one entity which is incorrect. Instead had to make a new Independent Graphs algorithm that only looks at Operators as the actual graph. Test Plan: old queries work, problematic query added as a new test in logical planner Reviewers: nserrino, jamesbartlett, #engineering Reviewed By: nserrino, #engineering JIRA Issues: PL-1638 Differential Revision: https://phab.corp.pixielabs.ai/D4024 GitOrigin-RevId: b2b3a9955ec17b3d8bd52434fde91b6b754b747d
-
- 20 Feb, 2020 1 commit
-
-
Natalie Serrino authored
Summary: We had a problem with filters where if you had two filters in a row, where both filters used columns for the filter condition that were not used downstream anywhere, you would end up with misaligned output relations. We don't need to output columns that are only used for evaluating a filter condition, so this diff updates the logic to take advantage of column selection in the filter node. Test Plan: added Reviewers: philkuz, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1497 Differential Revision: https://phab.corp.pixielabs.ai/D3551 GitOrigin-RevId: 070a55f15d43b39c3b59df403b47f6095ab590b3
-
- 17 Feb, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: Depends on D3526. Final diff in the refactor -> rename namespaces. This logically sorts everything in the planner. Test Plan: pass tests from before, hope nothing breaks. Reviewers: nserrino, zasgar, michelle, #engineering, jamesbartlett Reviewed By: jamesbartlett Differential Revision: https://phab.corp.pixielabs.ai/D3528 GitOrigin-RevId: 4a863b5fe2fe13a397f95e458bdcc4a2d074657c
-
- 16 Feb, 2020 3 commits
-
-
Phillip Kuznetsov authored
Summary: Depends on D3525. Follow up diff will have a rename of the namespaces. Test Plan: tests pass, just a refactor Reviewers: #engineering, zasgar, jamesbartlett, nserrino Reviewed By: jamesbartlett JIRA Issues: PL-1473 Differential Revision: https://phab.corp.pixielabs.ai/D3526 GitOrigin-RevId: 610a06247db629d186b945ed520e20a22fc2b139
-
Phillip Kuznetsov authored
Summary: Depends on D3524. TSIA Test Plan: everything tests and works. Reviewers: nserrino, zasgar, jamesbartlett, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3525 GitOrigin-RevId: bde2e5053ed366540dee8768cde31d8990b118f7
-
Phillip Kuznetsov authored
Summary: Part of the cleanup of the compiler directory, moving distributed to its own directory. Eventually we'll move compiler components to its own directory as well. Test Plan: tested to make sure everything works after making this move. Reviewers: nserrino, zasgar, jamesbartlett, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3524 GitOrigin-RevId: 8bbbda506c4bb14918742a23489495e53502888c
-
- 13 Feb, 2020 1 commit
-
-
Natalie Serrino authored
Summary: Depends on D3495. This rule adds a limit above memory sinks when the node above it isn't a limit. If the node above it is a limit, it edits the limit where appropriate. Next will be to hook this into the analyzer. Test Plan: added tests Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1452 Differential Revision: https://phab.corp.pixielabs.ai/D3496 GitOrigin-RevId: 8416c8cd9037895f877d22a4a3abeaa5ac4ec8dd
-
- 11 Feb, 2020 1 commit
-
-
James Bartlett authored
Summary: Adds a .rolling function to support windowed aggregates, eg. ``` t1 = px.DataFrame(..., select=['time_', 'col1']) t1 = t1.rolling('2s').agg(...) ``` The rolling function also supports an `on` parameter to specify which column to window on, however it currently only supports windowing on `time_`. `t1.rolling('2s')` is equivalent to `t1.rolling('2s', on='time_')` and `t1.rolling(10, on='col1')` is currently not supported but will be in the future. Currently, the proto spec is undefined so attempting to compile to a proto will result in an Unimplemented error. This PR also adds a GroupAcceptorIR that serves as an interface for any IR that can accept groups from a GroupBy op. This PR changes BlockingAgg to subclass GroupAcceptorIR and also the new RollingIR is also a subclass of GroupAcceptorIR. This allows for both `t1.groupby(...).rolling(...).agg(...)` and `t1.groupby(...).agg(...)` to be handled by the Merge rule. Additionally, this PR adds support for the RollingIR window size parameter to accept time strings as above, or compile time expr evaluation such as `t1.rolling(1 + px.now())`. Currently, the RollingIR is left in the graph, but once the spec for the proto is known it will likely need to be merged into the Agg. Test Plan: - Add tests to ensure new `MergeGroupByIntoGroupAcceptorRule` works for both `RollingIR` and `BlockingAggIR`. - Add tests to ensure transition from `ConvertMemSourceStringTimesRule` to `ConvertStringTimeRule` still works for mem source and additionally works for Rolling now. - Add tests that RollingIR node gets created properly. - Add tests to check that compile time expr eval works for new Rolling op. Reviewers: #engineering, philkuz, nserrino Reviewed By: #engineering, nserrino Subscribers: nserrino, philkuz JIRA Issues: PL-754 Differential Revision: https://phab.corp.pixielabs.ai/D3365 GitOrigin-RevId: c59cfb51bb9f11d444ee6551c6092ed17710aa22
-
- 06 Feb, 2020 1 commit
-
-
James Bartlett authored
Test Plan: Tests pass. Reviewers: #engineering, philkuz, nserrino, zasgar Reviewed By: #engineering, zasgar Differential Revision: https://phab.corp.pixielabs.ai/D3440 GitOrigin-RevId: 55df3886123156ab3da4216225fcaed3c12d454c
-
- 30 Jan, 2020 1 commit
-
-
James Bartlett authored
Summary: Adds a compiler rule that determines which operators are connected to a MemorySink, and removes those operators. This rule is run before the rule that prunes unused columns. Test Plan: Added a compiler level test for this behaviour, as well as two rule level tests. Reviewers: #engineering, nserrino, philkuz Reviewed By: #engineering, philkuz JIRA Issues: PL-1336 Differential Revision: https://phab.corp.pixielabs.ai/D3299 GitOrigin-RevId: 0cf40938fb3c6d8e104a8de9e6cecc27f557f72d
-
- 27 Jan, 2020 1 commit
-
-
Natalie Serrino authored
Summary: This adds an automatic check on rules_test to ensure that each of the applied rules cleans up after itself. This also comes along with some cleanup of the way ir_nodes handle certain method calls. depends on D3302. Test Plan: existing Reviewers: #engineering, philkuz, jamesbartlett Reviewed By: #engineering, philkuz JIRA Issues: PL-1349 Differential Revision: https://phab.corp.pixielabs.ai/D3303 GitOrigin-RevId: 6aab4d98788b52a18eb4f1c35764a554cb783b5c
-
- 24 Jan, 2020 2 commits
-
-
Natalie Serrino authored
Summary: Add a utility for nodes to remove their prior children if and only if those children have no other parents. Have rules replace DeferNodeDeletion with DeleteNode, and support skipping deleted nodes in rule execution. Test Plan: existing Reviewers: philkuz, jamesbartlett, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1349 Differential Revision: https://phab.corp.pixielabs.ai/D3292 GitOrigin-RevId: 371fa81264dece4f74c6d7a43e546c92e7671c75
-
Natalie Serrino authored
Summary: DeleteNodeAndChildren is not used and it is also not safe to nodes with multiple parents (ExpressionIR with multiple parents got introduced after it was written). Also DeleteNode was leaving the nodes in id_to_nodes_map which led to inconsistent results between the IR dag and IR node map (each of which can be checked depending on the situation). Test Plan: edited a test to work with these fixes Reviewers: philkuz, jamesbartlett, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3283 GitOrigin-RevId: 24290054ab30dc4454a300562a65eae65428d6b2
-
- 25 Jan, 2020 1 commit
-
-
Natalie Serrino authored
Summary: Filter uses the same output relation as the input relation. As a result, if we had a plan like this: src -> filter -> map where map produced columns col1, col2, but filter operated on col0. The way it was previously implemented, col0 would be pruned out by our new rule, thus breaking the filter condition. What we need to do is have filter output col0, col1, col2 so it can still use col0, and then have col0 get pruned by whatever downstream node doesn't need it. This diff does that. Test Plan: added Reviewers: philkuz, jamesbartlett, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3290 GitOrigin-RevId: 5fe09fa45eafdcc52d8cafc2af3567ea51cd5ebc
-
- 24 Jan, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: PL is old new, px is the new news. Get out and use px. Test Plan: all tests pass with the new changes, no new functionality. Reviewers: zasgar, nserrino, michelle, jamesbartlett, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3273 GitOrigin-RevId: 970e66dcc75ecbf455319819f348d1b9daa93e0d
-
- 23 Jan, 2020 1 commit
-
-
Natalie Serrino authored
Summary: We don't want to have random stray nodes that are lying around and not used by the plan. We can consider adding it as a DCHECK to other rule batches or the ast_visitor code if we don't want to rely on it for pruning and instead expect the other rules to take care of themselves. In that case we would run the rule and DCHECK that it didn't do any work on the graph, and if it did we would know that we were leaving stray nodes. Test Plan: added Reviewers: philkuz, zasgar, jamesbartlett, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3271 GitOrigin-RevId: 83f0db4325f0aa6228e9190af9de86c526ec814f
-
- 21 Jan, 2020 2 commits
-
-
Natalie Serrino authored
Summary: Depends on D3224, D3230, D3231. Test Plan: added tests for various cases Reviewers: philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1319 Differential Revision: https://phab.corp.pixielabs.ai/D3236 GitOrigin-RevId: 9a5ec28ff446a926b7bfc74c6190f5adb0b5ced4
-
Natalie Serrino authored
Summary: Currently the output relation of a sink is just taken from the parent, however sinks do have a field called out_columns_ which is meant to support selecting a subset of output columns. PL-1319 uncovered these two cases in some of its logic, so here is part of the work in solving PL-1197 which supports specifying output columns in pl.display in the QL. Test Plan: added Reviewers: philkuz, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1197, PL-1319 Differential Revision: https://phab.corp.pixielabs.ai/D3241 GitOrigin-RevId: 44251ed7cff6b191a96eb0c82b7ce6a58b04ad62
-
- 17 Jan, 2020 1 commit
-
-
Natalie Serrino authored
Summary: PL-1319 will add in a compiler optimization that prunes columns that are unnecessary to the script output. In order to do this nicely, it makes sense for the resolution of a column to its ultimate index in the plan to happen as late in the game as possible, as the input relation to a given operator may shift as columns are pruned. UnionIR previously stored column indexes to refer to columns, rather than a ColumnIR. This is error prone given the upcoming optimization, because those indexes could become stale. We want all IRNodes to use ColumnIR types when referring to columns so that it's easier for us to figure out what is able to be pruned. As a result, I moved UnionIR to using ColumnIR instead of indexes. This exposed an issue where we were not resolving column indexes for columns/operators added after the analyzer phase. As a result, a step in the distributed analyzer was added to resolve column indexes, so that nodes that were added to the plan after the initial analyzer phase still have their columns resolved. Depends on D3171 and D1319. Test Plan: added tests Reviewers: philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1317, PL-1319 Differential Revision: https://phab.corp.pixielabs.ai/D3202 GitOrigin-RevId: 4b6c3ef40d3a44b1bcd32d0500682d5ff2f6fdf5
-
- 15 Jan, 2020 2 commits
-
-
Natalie Serrino authored
Summary: This refactor is in service of PL-1319, which will add an analyzer phase that prunes unused columns from the plan. In order to do that, we want to defer setting the column index as long as possible, because the input/output relation of each operator may change when its columns are pruned. During the analyzer phase, we want to entirely deal in terms of column names, rather than mixing both, only moving over to column indexes at the end. Right now, column type and column index were being set at the same point, but column type is needed for many phases of the analyzer, and column index is only needed at the very end when ToProto is called on the operators. Therefore, these get set separately and I added a rule where the column index is resolved at the final step of the analyzer. Test Plan: existing should pass Reviewers: philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1319 Differential Revision: https://phab.corp.pixielabs.ai/D3171 GitOrigin-RevId: 11bcb7fe9f663a91097a2c153e78fef555d53eac
-
Natalie Serrino authored
Summary: PL-1319 as well as PL-1317 will require the addition of a distributed analyzer, which will share many characteristics with the current analyzer. We want to be able to have the same logic that exists for rules on IRs be able to execute on DistributedPlans as well. Analyzer: traverses IR, applies Rules to IRNodes DistributedAnalyzer: traverses DistributedPlan, applies DistributedRules to CarnotInstances. We want the rule executor and the graph walking stuff, as well as the patterns for Apply() on Rules to be shared on both these cases. The first use case for a DistributedAnalyzer is to have IR rules executed on each of the Plans for each CarnotInstance, so support for that kind of rule is added in this diff. However there will be other kinds of DistributedRules that actually modify the top-level DAG of DistributedPlan in the future. For example, an optimization that removes an entire CarnotInstance from the DistributedPlan because a filter condition causes there to be no data present on a given node that will be included on the output. Next step will be to edit RuleExecutor to be generalized to be able to use Rules or DistributedRules. After that, the DistributedAnalyzer will be added to the codebase. Then, a rule which needs to be added for unions post-distributed splitting will be added to the DistributedAnalyzer. Eventually, much of the distributed splitting/stiching logic can be re-articulated as part of the DistributedAnalyzer. Test Plan: existing, added Reviewers: philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PL-1317, PL-1319 Differential Revision: https://phab.corp.pixielabs.ai/D3191 GitOrigin-RevId: 6e42870fbb51c0454a6248c078c1011b71ae05a9
-
- 11 Jan, 2020 1 commit
-
-
Zain Asgar authored
Summary: Not used except in test, probably left over. Test Plan: bazel test //... Reviewers: michelle, nserrino, philkuz, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3142 GitOrigin-RevId: 9ef59aff6a75b30e97ae059a417fdb066698e28e
-
- 09 Jan, 2020 1 commit
-
-
Phillip Kuznetsov authored
Revert the changes to the RegistryInfo pointer in compiler state because I misjudged the lifetime of that object. Summary: I misjudged the lifetime of the RegistryInfo object. It turns out that it should last longer than the compiler state object, something that will be necessary with upcoming changes to the LogicalPlanner object. Test Plan: tests pass with changes, this is a nearly 1 to 1 reversion of teh changes in a prev commit. Reviewers: zasgar, nserrino, michelle, #engineering Reviewed By: michelle, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D3111 GitOrigin-RevId: e2e0666d7da34d9eea44e3c358eb3f3164b7251e
-