This project is mirrored from https://gitee.com/cowcomic/pixie.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 04 Sep, 2020 3 commits
-
-
Natalie Serrino authored
Summary: Thought that this should be an error condition, but based on the control flow it can be correct that an agent will fail to send exec stats before the kelvin completes the query. In addition, an agent may go away in the middle of the query and that is not an error. Test Plan: n/a Reviewers: zasgar, #engineering, philkuz Reviewed By: #engineering, philkuz Subscribers: philkuz Differential Revision: https://phab.corp.pixielabs.ai/D6159 GitOrigin-RevId: 64f54b196b1a16a3f98164f77cff077ed6a4e2bf
-
Michelle Nguyen authored
Summary: there was an excessive amount of calls to get GetHostnameIPPairFromPod function whenever an endpoint was updated. this function is used to determine which IPs (and therefore which agents) each pod in the endpoint maps to. This is because we were trying to use a function that had been created for another purpose: for this endpointUpdate, get the resourceUpdate for this particular host. instead, its probably cleaner to write a new function that better fits the purpose: for this endpointUpdate, get all resourceUpdates that should be sent + the hosts those updates it should be sent to Test Plan: unit test + ran on skaffold with extra logging to make sure we're sending correct updates still Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6161 GitOrigin-RevId: 3b6eb14548c9fafecb215b3a960124ed9043f7cb
-
Zain Asgar authored
Summary: UDF looks up DNS information. It's experimental b/c we probably want to add some caching if this turn out to be useful. ``` df.hostname = px.nslookup(df.ip_add) ``` Test Plan: N/A, will add tests if this is useful. Reviewers: michelle, nserrino, #engineering Reviewed By: michelle, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6160 GitOrigin-RevId: cef51528f45300137d5314ce1f380825937fe551
-
- 03 Sep, 2020 1 commit
-
-
Michelle Nguyen authored
Summary: etcd reads are faster if we don't need to ensure linearizability. since we have a single replica, we shouldnt need linearizability Test Plan: ran skaffold and a bunch of queries... it doesnt seem to break anything. testing the timing is harder since we should probably run a real benchmark for that Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6157 GitOrigin-RevId: ea207be6003ba62bba17de71d4cee10001e22e4c
-
- 04 Sep, 2020 1 commit
-
-
Michelle Nguyen authored
Summary: whenever we have a lock, it should not have to wait on particularly slow operations, such as etcd. i updated most of the wrappers so that the lock is acquired after the metadata operation. @nserrino, you probably know this area best: the one that probably needs most review is the GetAgentUpdates fix, where we pull out the updates and clear the updatedAgents before making metadata calls. one condition I could imagine that can happen now is: - readInitialState is true - we start to read the agents/agentData from etcd - in the middle, we have other agent updates which are written to updatedAgents - we may or may not read these new updates from etcd, depending on timing - the next time GetAgentUpdates is called, we may read updates in updatedAgents that were already sent previously. will the query broker accept this? Test Plan: ran in skaffold and unit tests, things seem to still work Reviewers: nserrino, zasgar, #engineering Reviewed By: nserrino, zasgar, #engineering Subscribers: nserrino Differential Revision: https://phab.corp.pixielabs.ai/D6152 GitOrigin-RevId: 0f55a30d257156f98053c227868b99f2e32f91cf
-
- 03 Sep, 2020 5 commits
-
-
Natalie Serrino authored
Summary: Latest changes to support an updated Live View link (include cluster name) actually broke the case where the user doesn't provide the cluster id via -c in the CLI. This diff fixes that case so that the Live View link will always print the right cluster ID, even if it is a randomly selected one, and also it won't error out. Test Plan: ran the cli Reviewers: jamesbartlett, zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6155 GitOrigin-RevId: 6f0dbc458adb8ce2caf97124a7ffb0f6153cbd88
-
Michelle Nguyen authored
Summary: we want to compact more often so that we can save space on etcd Test Plan: created an rc and deployed both the etcd statefulset and operator. things still appear to work after a compaction has occurred. Reviewers: zasgar, jamesbartlett, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6154 GitOrigin-RevId: a07c305a8e387f7d3590c954ef6fb1c62f71ec88
-
Natalie Serrino authored
Summary: We need to update the test to fetch the pxl scripts from the github repo, where they got moved to. Test Plan: n/a Reviewers: zasgar, oazizi, #engineering Reviewed By: zasgar, #engineering JIRA Issues: PP-2188 Differential Revision: https://phab.corp.pixielabs.ai/D6153 GitOrigin-RevId: d18ac572866d040cb0386d8bd39e06c8796db48e
-
Michelle Nguyen authored
Summary: Before this was all done sequentially, so it was possible for one agent's heartbeat to block another agent's heartbeat/register request. Now, we process each agent's message in separate goroutines. Test Plan: unit test also ran on skaffold and added logs to track everything going on. it appears to work as expected... but I am also only running 2 pems + 1 kelvin. Reviewers: zasgar, nserrino, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6149 GitOrigin-RevId: c6af9306187a35e9e191da349a9b0ab6d7967ef1
-
Michelle Nguyen authored
Summary: if an existing session already exists, auth0 just automatically logs the user in. however, this makes it impossible for the user to change their account if they want to. instead, it should ask them which account they want to choose. Test Plan: ran webpack Reviewers: jamesbartlett, #engineering, zasgar Reviewed By: #engineering, zasgar Differential Revision: https://phab.corp.pixielabs.ai/D6150 GitOrigin-RevId: bbc511618a648c9bb7ec8793760ee6d5a663d4ba
-
- 27 Aug, 2020 1 commit
-
-
Zain Asgar authored
Summary: Moves scripts and updates deploy command. Test Plan: `make update_bundle` Reviewers: michelle, #engineering Reviewed By: michelle, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6105 GitOrigin-RevId: 22f216abb5bdbbed32cd493404f066785882ce19
-
- 31 Aug, 2020 1 commit
-
-
Omid Azizi authored
Summary: Stumbled across a way to avoid disabling SC1090, so use it. Test Plan: Jenkins Reviewers: yzhao, #engineering Reviewed By: yzhao, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6136 GitOrigin-RevId: 0fc0e67f488600bce94e7b69754daf336078e390
-
- 02 Sep, 2020 1 commit
-
-
Omid Azizi authored
Summary: Automating the test running process for different enviornments. Also fixes a prematurely landed https://phab.corp.pixielabs.ai/D6121 Test Plan: Manual Reviewers: yzhao, #engineering Reviewed By: yzhao, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6134 GitOrigin-RevId: dd3e60f4055e101d8c893870aa50c01a0e095be6
-
- 31 Aug, 2020 1 commit
-
-
Michelle Nguyen authored
Summary: I put out a previous diff that should fix the index comparison issue in log-collector, and tested on plc-dev. after landing, i tried it out on staging and was surprised to see the log-collector was still complaining about reindexing. looking at the elastic settings, i saw the log indices on plc-staging and plc both had 1 shard and 1 replica only, which was why the index comparison was still failing. renaming the index once again so that we don't lose our previous logs, but we can start up an index with the correct settings. Test Plan: n/a Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6135 GitOrigin-RevId: 50bc0f28ba2e69ea451d6c8105eeaa53514fcf40
-
- 02 Sep, 2020 1 commit
-
-
Natalie Serrino authored
Summary: otherwise px run just chooses a random cluster ID which will make the results less useful. Test Plan: ran the test Reviewers: philkuz, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6143 GitOrigin-RevId: e57258e22bcefd0db234e59f522cf7f95c6338d5
-
- 01 Sep, 2020 2 commits
-
-
Natalie Serrino authored
Summary: Fixing this to help with investigating streaming queries. When the active tab name wasn't set to a table name, we would reset it to the first table. this behavior didn't account for the fact that the stats tab was also a valid selection, so that logic is now incorporated. Test Plan: ran the ui Reviewers: michelle, philkuz, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6141 GitOrigin-RevId: 9d10c701a7a47e3bb074f3a192098e4165de957e
-
Natalie Serrino authored
Summary: While trying to debug whether or not there are streaming issues in the CLI, I noticed that our live view links are out of date, so they are updated now. Test Plan: ran px run and px live and tried the outputted urls. Reviewers: michelle, zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6139 GitOrigin-RevId: 29b0dd12dee4433f12264812fcc24044625954ff
-
- 02 Sep, 2020 1 commit
-
-
Natalie Serrino authored
Summary: Fixed this as a part of debugging whether streaming introduced new issues. This plus D6141 should fix the execution stats tab to actually give rows and bytes now. Test Plan: ran queries, checked the UI. Reviewers: philkuz, jamesbartlett, zasgar, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6142 GitOrigin-RevId: 7bc79ea27eb72b5a7f09fbbf50a8a846c60c3295
-
- 28 Aug, 2020 1 commit
-
-
Natalie Serrino authored
Summary: TSIA. Test Plan: this is the test plan Reviewers: michelle, philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-2117 Differential Revision: https://phab.corp.pixielabs.ai/D6120 GitOrigin-RevId: 580745781271daea64380b35592547db35328bb4
-
- 01 Sep, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: Same args as `px run` ``` bazel run -c opt //src/e2e_test/vizier/exectime:exectime -- -c 4388aa1e-1666-48f8-9cf7-437650e62255 ``` Test Plan: Tested with single cluster and multi-cluster Reviewers: nserrino, zasgar, jamesbartlett, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6140 GitOrigin-RevId: ddcbc99e8d81f7fdbbcf922cfbbeeae2ff63239b
-
- 31 Aug, 2020 1 commit
-
-
Phillip Kuznetsov authored
Summary: Exectime benchmark doesn't summarize errors, addin em in. Test Plan: Tested on a cluster that fails certain queries. Reviewers: nserrino, zasgar, #engineering Reviewed By: nserrino, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6137 GitOrigin-RevId: d2242d768ba69ed49235bb37b6c1447d9dc0cc2f
-
- 28 Aug, 2020 1 commit
-
-
James Bartlett authored
Summary: A while back, I was trying to fix a bug with Limit nodes when there were 2 separate graphs one limit would prevent the other graph from running. In fixing that bug, I introduced a new bug with limits that have multiple sources (i.e. a limit after a union), where the limit would output empty row batches for each of the other sources if one of the sources was enough to reach the limit. (I discovered this b/c these empty row batches each of eos set, causing an agg after the limit to output 1 row for each row batch) This reverts my old fix and adds a new fix that works in both cases. Test Plan: I added tests for both cases, as expected the second case's test fails on master, and both pass with this diff. Reviewers: #engineering, philkuz Reviewed By: #engineering, philkuz Differential Revision: https://phab.corp.pixielabs.ai/D6118 GitOrigin-RevId: b20f695ef286c14880b765e791bb1bc27ea62ff4
-
- 31 Aug, 2020 6 commits
-
-
Yaxiong Zhao authored
Test Plan: Jenkins Reviewers: oazizi, #engineering Reviewed By: oazizi, #engineering Subscribers: philkuz Differential Revision: https://phab.corp.pixielabs.ai/D6127 GitOrigin-RevId: a13bbc51b798f4bd4ef42d30a5594ce801bedce6
-
Yaxiong Zhao authored
Summary: Yamls for go_grpc_client and server Test Plan: Manual Reviewers: oazizi, #engineering Reviewed By: oazizi, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6111 GitOrigin-RevId: 8b2f1f635e8356a398442cba7ead32b6d28c4428
-
Omid Azizi authored
Summary: Encode the steps of testing different environments into scripts. Test Plan: Manual Reviewers: yzhao, #engineering Reviewed By: yzhao, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6121 GitOrigin-RevId: d5688e128de314c5dd717dc9661281c4103e99e2
-
Omid Azizi authored
Summary: String/ByteArray support was done very quickly for the demo, so some short-cuts were taken. This diff rectifies part of that. It is a step in the right direction; there is more to do. Test Plan: Existing tests. Reviewers: yzhao, #engineering Reviewed By: yzhao, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6123 GitOrigin-RevId: 87d558c15c54abb3a61221b0e052b151933773ac
-
Michelle Nguyen authored
Summary: the wrong version of when I was testing different ways to handle the timeout error got committed instead of the actual solution that was in the diff. Test Plan: n/a Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6132 GitOrigin-RevId: bf22569b64cf79e4655d3ba659c9766c5073a7a7
-
Michelle Nguyen authored
Summary: We want to keep the cloud connector alive as much as possible, so that we always have a way to recover without requiring a fresh reinstall from our customers. The current update flow is: - delete all vizier resources (this includes cloudconn + its deps, minus etcd/nats) - launch new versions of all vizier resources In between the time of deletion and launching, the cloud connector deployment is completely gone from the namespace. if something hapepns in that time, it is possible that we may never get the cloud connector back. Instead, the flow is updated to be more like this: - delete all vizier resources (minus cloudconn + its deps) - launch new versions of all vizier resources. for cloudconn + its deps, this is an update. Now, there is no period of time where there is no cloud connector deployment. We need to additionally bounce the cloudconnector pod to handle a case where we try to update to the same version we're already on. since this is an update now, the cloudconnector pod just keeps running. however, we expect the cloudconnector to clean up the update job upon startup. Test Plan: tried 4 cases: - update 0.4.5 to RC with new update changes - deploy RC with update changes - update RC to same RC version - update RC to newer RC version Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6129 GitOrigin-RevId: 365e9e74707e5d89a7f9c8794f8830635cf524fb
-
- 28 Aug, 2020 2 commits
-
-
Michelle Nguyen authored
Summary: turns out the previous fix didn't work, because the error wasn't actually the type I expected it to be. turns out theres no specific error type that we can check for this timeout error, so we have to check it manually. Test Plan: updated the job timeout to be 2s instead of 10 mins. confirm that my cluster updated properly anyway Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6112 GitOrigin-RevId: 5eaebfd5049f1ff5ab9e4badce9ea8d19ff7c406
-
Michelle Nguyen authored
Summary: when we made our vizier images public/private we removed :vizier_images_bundle and replaced it with :public_vizier_images_bundle and :private_vizier_images_bundle. This broke the release script for rcs because it couldn't properly get the number of changed commits and the name would always be something like 0.4.6-pre-master.0, which made creating multiple rcs from the same branch very annoying Test Plan: ran it Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6126 GitOrigin-RevId: 3bae66f02ca08924a909875440547a2567c27554
-
- 29 Aug, 2020 1 commit
-
-
Michelle Nguyen authored
Summary: the log-collector was constantly erroring every deploy with an "reindex not supported" error, although there was no reindexing required. @jamesbartlett added some logic for comparing the index settings for the existing index, and the new index, to see indexing is actually required or not. unfortunately, even after that change, it was still complaining about reindexing not being supported. after taking a look, its because we were passing in settings that looked like: ``` settings: { number_of_shards: 2 } ``` Which elastic does accept. However, the index that's actually stored in elastic looks more like: ``` settings: { index: { number_of_shards: 2 } } ``` so the comparison actually fails and still thinks it needs to be reindexed. to fix this, I updated our index settings to look more like the elastic-representation. perhaps in the future we can make our elastic index checker more robust to this case, but that doesn't really seem worthwhile to me right now Test Plan: ran plc-dev and made sure logcollector no longer crashes Reviewers: jamesbartlett, zasgar, #engineering Reviewed By: jamesbartlett, #engineering Subscribers: jamesbartlett Differential Revision: https://phab.corp.pixielabs.ai/D6131 GitOrigin-RevId: 8af6ff1b23952f78d09ef1332c1760ec1b1b3bb4
-
- 28 Aug, 2020 1 commit
-
-
Yaxiong Zhao authored
Summary: Remove TODO that is no longer relevant: the API is no longer exposing code generation details. GenProgram() -> GenBCCProgram() Test Plan: Jenkins Reviewers: oazizi, #engineering Reviewed By: oazizi, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6128 GitOrigin-RevId: cac1315c19003576a8de7c319632577a70c0567c
-
- 27 Aug, 2020 1 commit
-
-
Natalie Serrino authored
Summary: added handling in the query broker for these heartbeats, and add some unit tests that were missing. this will help for streaming queries with sparse data where data isn't being sent over that often but we don't want to timeout. we can use the heartbeats to send over the execution stats for the query up to that point as well, so that streaming queries can still show something for the execution stats. Test Plan: added tests Reviewers: michelle, zasgar, #engineering, philkuz Reviewed By: #engineering, philkuz JIRA Issues: PP-2115 Differential Revision: https://phab.corp.pixielabs.ai/D6113 GitOrigin-RevId: d4df2b1e4da244425b98ccd30cc16a620908ae5d
-
- 28 Aug, 2020 2 commits
-
-
Natalie Serrino authored
PP-2115: Deprecate old QueryBrokerService (qb implements other services) and its only API, ReceiveAgentResult. Summary: These have now been subsumed by ResultSinkService, TransferResultChunk. Test Plan: existing Reviewers: michelle, philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering JIRA Issues: PP-2115 Differential Revision: https://phab.corp.pixielabs.ai/D6119 GitOrigin-RevId: d8580bbeb74bc77d06f8f57cdd98fa320ad371de
-
Michelle Nguyen authored
Summary: when registering a vizier with a name that we already have in the database, we do a loop 10 times to try registering with the name <name_%d>, where d is the number of times the loop has run. this works when there are only 10 clusters with the same name, but once the 11th one tries to register, we error. obviously this isnt scalable. instead, we can try to get a random number from a large pool of large numbers. this is happening especially in the case where everyone's cluster is named "minikube" Test Plan: unit test Reviewers: zasgar, #engineering, oazizi Reviewed By: #engineering, oazizi Subscribers: oazizi Differential Revision: https://phab.corp.pixielabs.ai/D6107 GitOrigin-RevId: 19cab69e994285f865f7e1f11c2849552140073b
-
- 27 Aug, 2020 1 commit
-
-
Michelle Nguyen authored
Summary: currently we have no insight into the etcd running on our customer's clusters. luckily, with the client we can hit some of the etcd endpoints to see the amount of space we're currently using. we're constantly running into space issues on customer, so this should give us a sense of how often we may need to defrag or if a defrag won't be enough. Test Plan: ran in skaffold Reviewers: zasgar, #engineering Reviewed By: zasgar, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6115 GitOrigin-RevId: cd3756359cf209ccfd3d11df28726e6959fd98e5
-
- 28 Aug, 2020 4 commits
-
-
Michelle Nguyen authored
Summary: We ran into an error in the querybroker where the agent state was unable to update because of this error occurring every 5s: ```Received error running agent tracker loop. Retrying in 5 seconds. Could not update agent table metadata of unknown agent 7309788f-79d8-4f80-ad7d-b6c3e13b47aa``` This means that the metadata sent the qb schema/data for an agent which has been deleted. Taking a look at the code, it is definitely possible to update schema/data for an agent which no longer exists. since we process agent updates in a queue, its possible for us to put an update on the queue and not process it before the agent has already been deleted. this is especially possible in larger clusters where there may be many updates in the queue. before updating the datastore with the new schema/data, we should check whether the agent actually exists. Test Plan: unit test Reviewers: nserrino, #engineering, philkuz Reviewed By: #engineering, philkuz Differential Revision: https://phab.corp.pixielabs.ai/D6116 GitOrigin-RevId: 37b1eab1cd03cdf796d701e35d951fc8f5e7e582
-
Natalie Serrino authored
Summary: tsia. Test Plan: none Reviewers: michelle, zasgar, #engineering, philkuz Reviewed By: #engineering, philkuz JIRA Issues: PP-2170 Differential Revision: https://phab.corp.pixielabs.ai/D6117 GitOrigin-RevId: 861fd6deb3b513092eba783d7a2b0b1fa9a8f7e0
-
Omid Azizi authored
Summary: For convenience. Could consider turning it into a basel sh_test. Test Plan: None Reviewers: yzhao, #engineering Reviewed By: yzhao, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6110 GitOrigin-RevId: f694fcda9a55ecc4fc201a3958b9c5f00f0137f7
-
Natalie Serrino authored
Summary: TSIA Test Plan: existing. Reviewers: michelle, philkuz, zasgar, #engineering Reviewed By: philkuz, #engineering Differential Revision: https://phab.corp.pixielabs.ai/D6109 GitOrigin-RevId: 1916cbe92049a0c77bdb5c2f9649c4ad3eb33e13
-