Commits · release/cloud/prod/1599873415 · 小白蛋 / Pixie

This project is mirrored from https://gitee.com/cowcomic/pixie.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

12 Sep, 2020 2 commits

Update configs for visjs graphs · f67d0ab4

Michelle Nguyen authored 4 years ago

Summary:
updated the graph configs to speed things up, such as the number of stabilization iterations, smooth edges, improved layout (which actually console.logs that you should disable it for large graphs)
clustered graphs look like they were having a problem since we were setting the cluster id, when it was already set to what we wanted.
there is still some slowness, but atleast i havent had anything hang.

Test Plan: tried it with customer's non-prod clusters, which have pretty big graphs

Reviewers: zasgar, nserrino, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6215

GitOrigin-RevId: c6e57465357bc3207b78bc9d3dab2a19f0511f34

f67d0ab4

PC-536 Fix bug where grpc+http cant be served by the same stream · 1799fe59

Michelle Nguyen authored 4 years ago

Summary:
we were seeing issues where gql requests were getting 502s while a grpc request was being made.
this is because we can't serve both requests on the same stream in nginx.

Test Plan: tested on staging with something polling gql, and something else polling a grpc request

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6214

GitOrigin-RevId: 9a253e182d3a1830dd815ba48c47b6eccc5ffc3e

1799fe59

11 Sep, 2020 1 commit

Return VarType::kUnspecified to indicate an unsupported type · 03271d79

Yaxiong Zhao authored 4 years ago

Summary:
Previously the function returns an error status.

This avoids confusing error when deploying a probe that does not
actually probe arguments with unsupported types.

GetFunctionArgInfo() is called to return the ArgInfo of all of the
arguments of a function, regardless of whether the argument is
probed or not.

This fix tries to minimize the scope of changes.

A possible alternative is to only resolve ArgInfo for each arg
expression, which appears more intrusive.

Test Plan: Jenkins

Reviewers: oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6203

GitOrigin-RevId: 6f18e52e97d0be957b1c038e0229ca4557f9b1f6

03271d79

09 Sep, 2020 1 commit

Rename AUTO to LANG_UNKNOWN · 1e3d408d

Omid Azizi authored 4 years ago

Summary: For readability

Test Plan: Existing tests

Reviewers: yzhao, #engineering

Reviewed By: yzhao, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6188

GitOrigin-RevId: bc60521e0983ea6debac62555240e035b98a42d5

1e3d408d

11 Sep, 2020 4 commits

BPFTrace build · ca43960c

Omid Azizi authored 4 years ago

Summary: Restoring the BPFTrace submodule and build.

Test Plan: Manual

Reviewers: yzhao, #engineering

Reviewed By: yzhao, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6211

GitOrigin-RevId: c4fab9013bda7dd6ea92070da2cb864f430510bd

ca43960c

Equals Any to simplify writing comparisons. · eaa8e72a

Phillip Kuznetsov authored 4 years ago

Summary:
px.equals_any replaces long, chained  or equal statements with a simple call:
```
df = df[px.equals_any(df.remote_addr, ['10.0.0.1', '10.0.0.2', '10.0.0.3'])]
```

Test Plan: added compiler test as it requires multiple steps through the compuiler

Reviewers: nserrino, #engineering, zasgar

Reviewed By: #engineering, zasgar

Differential Revision: https://phab.corp.pixielabs.ai/D6209

GitOrigin-RevId: ff83c3485393751f7c1a62732ce3b2e6edc2298d

eaa8e72a

Enforce one GetAgentUpdates grpc stream at once · 3aded52c

Michelle Nguyen authored 4 years ago

Summary:
we're running into a bug where our GetAgentUpdates grpc streams are never terminated if a http2 timeout is hit.
we dont want all of these zombie streams to keep running in metadata, since they can build up, so we enforce only one is running at a time (since only one needs to run at a time, currently).

Test Plan: ran skaffold with http2 timeout of 10s to ensure that old streams are terminated

Reviewers: nserrino, zasgar, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6207

GitOrigin-RevId: 11008d1eec10e8ba74bd43791572dd1c2f1ec0e8

3aded52c

Fix zombie agent bug that can occur with two MDS running simulataneously · 97b30c04

Michelle Nguyen authored 4 years ago

Summary:
this doesnt happen during a normal deploy/update process. only when running on skaffold and two metadatas exist at the same time, with the older one as leader.
here's what could happen:
- old metadata is running
- new metadata is running and initializes its agent queues by doing a GetActiveAgents(). this hits etcd since theres nothing in the cache yet
- old metadata still not deleted yet, registers the new kelvin starting up and writes to etcd.
- new metadata finishes initializing. does not know anything about the kelvin, since it called GetActiveAgents before the kelvin was written.
- old metadata dies and stops responding to kelvin. kelvin dies.
- new kelvin starts up. since old metadata doesnt know about the old kelvin, it never gets cleared up

Test Plan: ran skaffold a bunch of times, verified there were no zombie agnets in the agent_status query

Reviewers: nserrino, zasgar, #engineering

Reviewed By: nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6205

GitOrigin-RevId: 1bc5669d86cdc078362d4d0411582cffcddec650

97b30c04

10 Sep, 2020 2 commits

Clean up agentHandler Stop logic · 6c30ba75

Michelle Nguyen authored 4 years ago

Summary:
cleaned up the agentHandler stop logic to use waitgroups rather than the quitDone channel.
also updated the logic to ensure that we cant try writing to the quitCh when it is already closed.
the following is something that could happen:
- Stop() is called because a new agent with X hostname is trying to register and an old agent with X hostname is already registered
- stop() closes the quitCh
- Agent dies before it is registered, for whatever reason
- New agent starts up, Stop() is called again for agent with X hostname
- Stop() tries to write to the quitCh
- panic

Test Plan: ran skaffold, deleted PEMs to confirm that things correctly get deleted/registered

Reviewers: zasgar, #engineering, nserrino

Reviewed By: #engineering, nserrino

Differential Revision: https://phab.corp.pixielabs.ai/D6199

GitOrigin-RevId: 05abab4144f55c521900bc5df83f0ae064181542

6c30ba75

Add test for custom protobuf sql driver scanner · 1c624d2a

Natalie Serrino authored 4 years ago

Summary:
Was debugging an issue where control plane pod status createdAtMs was returning a negative large number.
This didn't turn out to be the issue, but i wrote a test for the scanner for the control plane pod statuses class to see if it was causing the problem by mangling the data.

Test Plan: the test

Reviewers: michelle, zasgar, #engineering

Reviewed By: michelle, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6194

GitOrigin-RevId: 01698045cb6476153465a12038091fa61e5de094

1c624d2a

04 Sep, 2020 1 commit

Add LRU caching for nslookup · cbdbe1cb

Zain Asgar authored 4 years ago

Summary:
Added caching to help the performance of nslookup. We need to also batch/async these to improve performance
which will be the next optmization.

Test Plan: N/A, we don't have UDTF tests yet.

Reviewers: michelle, #engineering

Reviewed By: michelle, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6204

GitOrigin-RevId: 586acab78c378bf70cc10a2162d1d755c268c032

cbdbe1cb

11 Sep, 2020 2 commits

Better logging in the dynamic trace error field · 9ffaab4a

Phillip Kuznetsov authored 4 years ago

Summary: TSIA

Test Plan: Tested on dev-cluster-philkuz, simple change shouldn't break things.

Reviewers: zasgar, oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6206

GitOrigin-RevId: 233c25927ded78e0d641d656bd05f1abc757d03c

9ffaab4a

Allow agent manager to support concurrent agent update stream consumers · b7e9fa5e

Natalie Serrino authored 4 years ago

Summary:
We have a problem between the query broker and the metadata service across the API GetAgentUpdates. This API was originally designed for a single consumer (the singleton query broker).

We are running into an issue where the stream between them times out from the query broker's perspective. As a result, the query broker decides to reconnect. However, despite the GRPC error on the query broker side, from the metadata service's perspective, for some reason the first stream stays alive and the second stream gets connected too. GetAgentUpdates was only designed to support a single consumer, so both streams update the state, leading to inconsistent results from the non-zombie stream that has reconnected on the query broker.

@michelle is looking into the zombie streams issue, but for now this change allows the metadata services GetAgentUpdates to support multiple consumers. This is a step that is necessary for us anyway once we want to support multiple mds and qbs, and should put us a bit closer to the eventual design with versioned updates.

Test Plan: edited existing

Reviewers: michelle, zasgar, #engineering

Reviewed By: michelle, #engineering

Subscribers: michelle

Differential Revision: https://phab.corp.pixielabs.ai/D6202

GitOrigin-RevId: 0830feef6adddbe7e105ba34446e3fd04ca3040c

b7e9fa5e

10 Sep, 2020 4 commits

Add control plane label to etcd operator · 29d21722

Michelle Nguyen authored 4 years ago

Summary: we need this label so that we pick it up in cloudconnector and is sent to our UI as a control plane pod

Test Plan: n/a

Reviewers: zasgar, nserrino, #engineering

Reviewed By: zasgar, nserrino, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6197

GitOrigin-RevId: 74fdad3a6ca80bc3edba7e526db0163b4df931de

29d21722

Fix agent deletion/schema update race condition · f2d6bac6

Michelle Nguyen authored 4 years ago

Summary:
we have a race condition between agent deletions/schema updates.
currently, it is possible for the following to occur:
- agent heartbeats
- agent updates are put into a queue to be processed
- agent update is processing, sees agent is alive
- agent is deleted on a separate thread
- agent thinks agent is alive, so adds a schema for an agent which should no longer exist

to fix this I did a few things:
- refactor agent deletion so that it only occurs in a single place (when agentHandler is quitting). this required some use of channels and blocking so that if agent A with the same hostname as a newly registering agent B is deleted before we try to register agent B.
- move agent schema/process updates out of the singular queue. now they are processed within the onAgentHeartbeat call.

now, if an agent is deleted, it either must finish processing the current onAgentHeartbeat+applying schema/process updates before actually deleting the agent from the metadatastore. likewise, if a heartbeat comes after the agent is already deleted, the AgentHandler has already stopped, so no updates will be made for this agent.

Test Plan: ran skaffold with a bunch of logs to check things are done in the right order

Reviewers: zasgar, nserrino, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6195

GitOrigin-RevId: 346e26e3877fddea2d8a205be2d3601c70ba50eb

f2d6bac6

Fix issue where we iterate through deleted map entry · b6665c8a

Zain Asgar authored 4 years ago

Summary: We can't delete the entry while iterating through it. This fixes that issues by creating a deletion list.

Test Plan: ASAN Fix

Reviewers: michelle, #engineering

Reviewed By: michelle, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6196

GitOrigin-RevId: 1f65706ebeb3ce7c02ef6306378045bf83c61579

b6665c8a

Perform defrags from metadata instead of cronjobs · 2821ebc0

Michelle Nguyen authored 4 years ago

Summary:
This should give us more control and insight into defrag errors, if any.
for now, we start defragging once the etcd instance hits 500MB. after which, we will defrag for every hour unless it drops below 500MB.
we can calibrate this more once we get a better sense of when we should do defrags.

we should also not perform any cache flushes while defragging, so that all of the agent data stays in memory.

Test Plan: shortened the times so that I don't have to actually wait fo ran hour, then ran on skaffold with etcd stateful set and etcd operator.

Reviewers: zasgar, nserrino, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6189

GitOrigin-RevId: 35db15ea116605081d2536d9c7e5af5751cb224d

2821ebc0

09 Sep, 2020 1 commit

Metadata index requests should not block agent messages · c3a7311e

Michelle Nguyen authored 4 years ago

Summary:
currently, the metadata index requests are blocking the processing of any agent messages. this should only really happen when a cluster starts up, or when cloud sees any metadata is missing.
when metadata receives a message on nats,
it either calls the agentHandler.HandleMessage function, which puts the message on the correct agent channel.
Or, if it is a request for missing metadata for the cloud indexer, it calls the MetadataTopicListener.HandleMessage function, which makes the request to etcd and sends out the response. this needs to be processed in a separate channel.

Test Plan: ran skaffold

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6183

GitOrigin-RevId: 1ac7e757dd36f4e4647efb3d35e4bc1ba88d1af2

c3a7311e

08 Sep, 2020 1 commit

Fix cloud build update_bundle cmd · e4b326d1

Michelle Nguyen authored 4 years ago

Summary: the new `make update_bundle` requires px to be in the path and was erroring without it. updated the Jenkinsfile so that it builds the CLI and moves it to /usr/local/bin for the `make update_bundle` command

Test Plan: released staging + prod cloud

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6177

GitOrigin-RevId: 3a3109ff57fb48889c50c52f8b53649e2cc61105

e4b326d1

09 Sep, 2020 1 commit

Do not include remote_port in aggregation key · d3a1e481

Yaxiong Zhao authored 4 years ago

Test Plan: Jenkins

Reviewers: oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6191

GitOrigin-RevId: 435d9b15b25c4c0fa4d5875145bc98a04f3a5d65

d3a1e481

08 Sep, 2020 2 commits

Shared object tracing: Fixes for PEM usage · 6568b3bd

Omid Azizi authored 4 years ago

Summary:
A fix to the shared object path, which needs /host prefix.

Plus a bunch of related fixes to propagate errors up, that were required in analyzing this case.

Test Plan: Manually tested on GKE. Count on existing tests for the rest.

Reviewers: yzhao, #engineering

Reviewed By: yzhao, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6180

GitOrigin-RevId: bbc303a4e4942a59fdfa1634998aee43354d07af

6568b3bd

Etcd batch utils: should limit batches by max num txns and byte size · 0849cd31

Michelle Nguyen authored 4 years ago

Summary:
some of our customers clusters were hitting a problem where the number of bytes we were flushing in the cache was too much in a single txn.
we are currently batching requests by a max number of txns, however it is possible to have really large requests with less than the max number of txns.
this updates our etcd batcher util to take the max number of bytes into account as well.

Test Plan: unit test

Reviewers: zasgar, #engineering, nserrino

Reviewed By: #engineering, nserrino

Differential Revision: https://phab.corp.pixielabs.ai/D6178

GitOrigin-RevId: 12b4cf9a3ee5ce1b082effe34b930cd5dcc21120

0849cd31

09 Sep, 2020 5 commits

Do not include traffic_class in aggregation key · 4fcda2b2

Yaxiong Zhao authored 4 years ago

Test Plan: Jenkins

Reviewers: oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6184

GitOrigin-RevId: 2b5a33f13b1b0de6af3a108355a1e0f6b3a7a466

4fcda2b2

add in some extra log messages to provide more context when debugging... · aa79dc14

Natalie Serrino authored 4 years ago

add in some extra log messages to provide more context when debugging metadata<->query broker issues.

Summary: tsia. wanted to avoid printing actual agent ids because it clogs up the logs on large clusters.

Test Plan: n/a

Reviewers: michelle, zasgar, #engineering

Reviewed By: michelle, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6190

GitOrigin-RevId: bf9b6e3caea6a9ad61ada544ced5f2337721493c

aa79dc14

Do not export conn_stats records that are identical to the previous one · ab296ac6

Yaxiong Zhao authored 4 years ago

Test Plan:
Manual test with nc
* Launch stirling_wrapper
* nc -l 50050
* nc localhost 50050

When there is no data sent, no records exported from stirling_wrapper.

Reviewers: oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6185

GitOrigin-RevId: 5e1664b5b23443c9d0ebae4a005bc88910f3009c

ab296ac6

Auto-detect binary source language: part 2 · 273a9eb7

Omid Azizi authored 4 years ago

Summary:
Compiler now uses AUTO as the langauge.

This enables tracing shared libraries from the UI.

Test Plan: Test added.

Reviewers: yzhao, #engineering

Reviewed By: yzhao, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6175

GitOrigin-RevId: 9f0b357b2477b4ac09d471e661eaed195ca249c9

273a9eb7

Remove stray debug error log messages from metadata service · 4e7f24d7

Natalie Serrino authored 4 years ago

Summary: tsia

Test Plan: n/a

Reviewers: michelle, #engineering, zasgar

Reviewed By: michelle, #engineering, zasgar

Differential Revision: https://phab.corp.pixielabs.ai/D6186

GitOrigin-RevId: ad05c11bfaf955a929e1ec9e9246013255dedca6

4e7f24d7

07 Sep, 2020 1 commit

[Cleanup] Dynamic Trace BPF Test · 00ae8f37

Omid Azizi authored 4 years ago

Summary:
Program was being compile once in the test and once by Create(). Avoid this, and get schema from connector.

Also removed a bunch of member variables where local ones would suffice.

Test Plan: Existing tests.

Reviewers: yzhao, #engineering

Reviewed By: yzhao, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6174

GitOrigin-RevId: 521c2bfcccd6fceb9046c35c4de49208a62a58d6

00ae8f37

05 Sep, 2020 1 commit

PP-2115: Update GRPCSink node to send request initializing result stream... · 34fb703c

Natalie Serrino authored 4 years ago

PP-2115: Update GRPCSink node to send request initializing result stream before sending any Carnot results.

Summary:
Depends on D6167. Carnot upstream result destinations (such as Kelvin or the query broker) need to be able to track which of their inbound streams have initiated a connection, and monitor those connections for health. If any of those downstream result connections becomes unhealthy or takes too long to connect, we will cancel and timeout the query. This diff adds the logic to GRPCSinkNode to send those stream initialization message as soon as Open() is called. That way, the remote destinations don't have to wait for data to be produced by the source node to know the state of the connection, since production of result data may take a long time in a streaming query where the results are sparse.

Next up is switching the way that exec_graph.cc and query broker do timeouts. They should no longer time out when it takes too long to receive a result. They should time out if an inbound source takes too long to initiate a connection. Then they should monitor the successful connections during query execution to make sure nothing has been disconnected (query broker already does this part).

Test Plan: edited unit tests.

Reviewers: michelle, philkuz, zasgar, #engineering

Reviewed By: michelle, #engineering

JIRA Issues: PP-2115

Differential Revision: https://phab.corp.pixielabs.ai/D6169

GitOrigin-RevId: 7a7aa0cc0f39b156e3038322320d31de00000d44

34fb703c

03 Sep, 2020 1 commit

PP-2115: Change TransferResultSink API to support result sinks initializing... · 7f2a0428

Natalie Serrino authored 4 years ago

PP-2115: Change TransferResultSink API to support result sinks initializing their connection to the downstream destination.

Summary:
Streaming queries may spend a long time between sending result batches if the data they are producing is sparse. That means that the timeout based approach of monitoring query health will not scale to streaming queries, because a timeout is no longer an accurate way of assessing if a query is healthy. We want to replace this timeout method with checking the health of GRPC connections from the sources to the destnations for the result data. Kelvin nodes will cancel their queries if the connection between them and their source data agents gets lost. The query broker already does this handling, if one of its agent streams gets disconnected, it will cancel the rest of the query everywhere else. However, the question remained for how a query would be cancelled if its connections from source to destination were never established in the first place. The solution here is to have all sinks send a message with their identity and table/result they are producing when establishing a stream to a remote destination. Then, the destination can keep track of which streams it has received initialization for, and timeout if a sink takes too long to set up its connection to the destination.

This diff sets the groundwork for that by changing up the TransferResultSink message to support sending an initial message for a stream initializing the result sink. Next diff will have the query broker expecting and using those initialization messages, and removing its timeouts. After that, exec_graph.cc/h will be refactored to no longer timeout when data has not come in in a while, but do timeout if any of the exec graphs remote sources have taken too long to establish a connection.

Test Plan: existing/unit

Reviewers: michelle, philkuz, zasgar, #engineering

Reviewed By: michelle, #engineering

JIRA Issues: PP-2115

Differential Revision: https://phab.corp.pixielabs.ai/D6167

GitOrigin-RevId: 8a24e118e58bbdcf07f9b57efff29eeb75d8ed09

7f2a0428

08 Sep, 2020 2 commits

Remove historical records of closed connections · 716c7713

Yaxiong Zhao authored 4 years ago

Summary:
Previously, a conn_stats record is kept forever, resulting in ever-expanding memory use,
and ever-increasing data being exported to table store.

Test Plan:
Manual test with stirling_wrapper:
1. Launch stirling_wrapper
2. Launch nc -l 50050
3. Launch nc localhost 50050 -q 0 // -q 0 allows to close connection with ctrl-D
4. Observed that after close nc connection, stirling_wrapper no longer export records of the
   closed connection.

Writing a test with TCPSocket, but it is time consuming, so leave it in a follow up diff.

Reviewers: oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6181

GitOrigin-RevId: 085fe8681b936c513671560927ee8fe5d3ebf2e5

716c7713

Auto-detect language · d059a406

Omid Azizi authored 4 years ago

Summary: TBD

Test Plan: Existing tests

Reviewers: yzhao, #engineering

Reviewed By: yzhao, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6171

GitOrigin-RevId: 52e3df9cd42d864883e2d4a66f46011a97a8d388

d059a406

04 Sep, 2020 1 commit

Add SharedObjects for pxl Dynamic trace specifications target. · 93a6e5e5

Omid Azizi authored 4 years ago

Summary: as requested, has todos when proto is updated

Test Plan: addeda  test, probably needs more w/proto changes

Reviewers: nserrino, #engineering, philkuz

Reviewed By: nserrino, #engineering

JIRA Issues: PP-2191

Differential Revision: https://phab.corp.pixielabs.ai/D6148

GitOrigin-RevId: 12e43f384aefb02cff62a9de62b285862c35e909

93a6e5e5

08 Sep, 2020 2 commits

[CLEANUP] Remove previous logs when running stirling_wrapper on k8s · 669a75a5

Yaxiong Zhao authored 4 years ago

Summary: This makes its easy to distinguish old and new logs.

Test Plan: Manual run and works as expected

Reviewers: oazizi, #engineering

Reviewed By: oazizi, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6176

GitOrigin-RevId: 2f4b6788513a90a62e0389a78d0df35d3d74b638

669a75a5

Rename log index · 7d2f4e8a

Michelle Nguyen authored 4 years ago

Summary:
the previous name for our log index (vizier-logs-allclusters-2) was matching the index name pattern for our old logs (vizier-logs-allclusters-*), which was causing some weirdness with reindexing...
as a result, we have a log index that is 600gb...
after we create this new log, which should hopefully get managed correctly, we'll need to delete vizier-logs-allclusters-2

Test Plan: n/a

Reviewers: jamesbartlett, zasgar, #engineering

Reviewed By: jamesbartlett, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6179

GitOrigin-RevId: 8141b13b5c62adb9754208763de8f2e9b2a631af

7d2f4e8a

04 Sep, 2020 2 commits

Update etcd snapshot count · bbaf56c0

Michelle Nguyen authored 4 years ago

Summary:
we're still running out of space on customer's etcd, except now metadata is reporting that the etcd data itself is only 600 mb...
we need to figure out what else is taking up space.

a higher snapshot count increases the amount of snapshots we take, but reduces the number of raft logs we need to hold in memory.

Test Plan: n/a

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6170

GitOrigin-RevId: e29cd8642e6c8911e555e13aa6dd5c9ffe8df6ea

bbaf56c0

PP-2190: Fix bug with multiple limits in the same query path · 64ef4e2f

James Bartlett authored 4 years ago

Summary: TSIA

Test Plan: Added test to repro bug.

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

JIRA Issues: PP-2190

Differential Revision: https://phab.corp.pixielabs.ai/D6162

GitOrigin-RevId: 3748dbc76b1baf1efa36f38b12d88dd10c6be9db

64ef4e2f

06 Sep, 2020 1 commit

DeploymentSpec for shared libraries · 9be01bf6

Omid Azizi authored 4 years ago

Summary: To deploy libraries, including the case where the library is in a container.

Test Plan: Added a test for a library inside a container

Reviewers: #engineering, yzhao

Reviewed By: #engineering, yzhao

Subscribers: yzhao

Differential Revision: https://phab.corp.pixielabs.ai/D6151

GitOrigin-RevId: 3ce58fe7138ebd5c6ae8a3bf89e9b07c9ba43a38

9be01bf6

05 Sep, 2020 1 commit

Fix LaunchDarkly identify bug · e476dda3

Michelle Nguyen authored 4 years ago

Summary:
we were seeing "TypeError: Cannot read property 'identify' of undefined"
this is because the withLDProvider that sets up the client is running synchronously and can sometimes take longer to load than the vizier page renders.
to fix this we could use asyncWithLDProvider, which would block the rest of the page from rendering until the LDClient is loaded. documentation says this may sometimes tkae up to 200 ms.
instead, i just wrapped the LDClient code in an if statement. i took a look at the implementation of withLDClient, and it uses React context. so, when the LDClient does load, this should cause the vizier page to rerender and properly start up the LDClient.

Test Plan: n/a

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6166

GitOrigin-RevId: 044c7890105f73520a68260ea1dd36c1d28eb8e8

e476dda3

04 Sep, 2020 1 commit

Fix jenkins cloud build · 7c970b94

Michelle Nguyen authored 4 years ago

Summary: we recently updated the pxl makefile. there is no more staging bundle that needs to be updated after a staging deploy, and now the command is just "update_bundle" for prod

Test Plan: ran jenkins job

Reviewers: zasgar, #engineering

Reviewed By: zasgar, #engineering

Differential Revision: https://phab.corp.pixielabs.ai/D6168

GitOrigin-RevId: 8ee18ddca9f304f31000d88c14e7553e30a62757

7c970b94