This project is mirrored from https://gitee.com/cowcomic/pixie.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 05 Sep, 2020 1 commit
    • Michelle Nguyen's avatar
      Fix LaunchDarkly identify bug · e476dda3
      Michelle Nguyen authored
      Summary:
      we were seeing "TypeError: Cannot read property 'identify' of undefined"
      this is because the withLDProvider that sets up the client is running synchronously and can sometimes take longer to load than the vizier page renders.
      to fix this we could use asyncWithLDProvider, which would block the rest of the page from rendering until the LDClient is loaded. documentation says this may sometimes tkae up to 200 ms.
      instead, i just wrapped the LDClient code in an if statement. i took a look at the implementation of withLDClient, and it uses React context. so, when the LDClient does load, this should cause the vizier page to rerender and properly start up the LDClient.
      
      Test Plan: n/a
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6166
      
      GitOrigin-RevId: 044c7890105f73520a68260ea1dd36c1d28eb8e8
      e476dda3
  2. 04 Sep, 2020 1 commit
    • Michelle Nguyen's avatar
      Fix jenkins cloud build · 7c970b94
      Michelle Nguyen authored
      Summary: we recently updated the pxl makefile. there is no more staging bundle that needs to be updated after a staging deploy, and now the command is just "update_bundle" for prod
      
      Test Plan: ran jenkins job
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6168
      
      GitOrigin-RevId: 8ee18ddca9f304f31000d88c14e7553e30a62757
      7c970b94
  3. 05 Sep, 2020 3 commits
    • Omid Azizi's avatar
      Trace getaddrinfo in libc · 3f22e84e
      Omid Azizi authored
      Summary:
      Mostly checking in the proto as test code.
      
      Some accompanying fixes.
      
      Test Plan: See new test in stirling_dt_bpf_test
      
      Reviewers: yzhao, #engineering
      
      Reviewed By: yzhao, #engineering
      
      JIRA Issues: PP-2183
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6146
      
      GitOrigin-RevId: 76d2e395e6e1eb412d774da0c5a3047c2b972397
      3f22e84e
    • Omid Azizi's avatar
      StatusOr: ValueOr() · 24c6b7c8
      Omid Azizi authored
      Summary:
      To simplify cases where when there's an error, we want to use a default value.
      
      Similar to std::optional's value_or.
      
      Test Plan: Added tests.
      
      Reviewers: yzhao, zasgar, #engineering
      
      Reviewed By: yzhao, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6172
      
      GitOrigin-RevId: 0c7d5952315fd64e016017227ed0563a1ac3e0dd
      24c6b7c8
    • Omid Azizi's avatar
      Ensure containers are cleaned-up · 7e4bdfba
      Omid Azizi authored
      Summary:
      Noticed some containers leaking with Ctrl-C of tests.
      
      Orphaned sub-processes were not getting cleaned up by "timeout" command. So switch the approach to something different.
      
      Test Plan: Manual
      
      Reviewers: yzhao, #engineering
      
      Reviewed By: yzhao, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6164
      
      GitOrigin-RevId: 303f8c62438f97698edd03d2a98c79fdffcb69a2
      7e4bdfba
  4. 28 Aug, 2020 1 commit
  5. 02 Sep, 2020 1 commit
  6. 03 Sep, 2020 1 commit
    • Omid Azizi's avatar
      ElfReader: Locate debug symbols by debug-link · 1abfe8f9
      Omid Azizi authored
      Summary:
      We already had the ability to locate debug symbols by build-id.
      We also need support for locating it by debug-link. This adds that.
      
      The DwarfReader is also adjusted to use the external debug symbols, which it previously was not.
      
      Test Plan: Added ElfReader tests
      
      Reviewers: yzhao, #engineering
      
      Reviewed By: yzhao, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6144
      
      GitOrigin-RevId: c74e597369a29e29ce2292ca47f4a59ef1b8e4a0
      1abfe8f9
  7. 04 Sep, 2020 5 commits
    • Michelle Nguyen's avatar
      Metadata: fix excessive cidr logging · 451bc997
      Michelle Nguyen authored
      Summary:
      metadata service keeps logging "Error updating Pod CIDRs" for every pod update which doesn't have a PodIP... which is a lot of them.
      we should just not try to update the pod CIDR if the PodIP is empty.
      
      Test Plan: ran in skaffold
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6163
      
      GitOrigin-RevId: 6554b0026ececa21f83222b099f08f5a9d61550e
      451bc997
    • Yaxiong Zhao's avatar
      Add the encrypted secret json file for pulling images from gcr · 56921f24
      Yaxiong Zhao authored
      Test Plan: Manual test with run_on_k8s.sh, works fine
      
      Reviewers: oazizi, #engineering, zasgar
      
      Reviewed By: #engineering, zasgar
      
      Subscribers: zasgar
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6075
      
      GitOrigin-RevId: 788cf0e6dd85b716723e5d8e213d0b7e4f3ab03f
      56921f24
    • Natalie Serrino's avatar
      Downgrade inconsistent # of agent exec stats from error to log · 348bf597
      Natalie Serrino authored
      Summary: Thought that this should be an error condition, but based on the control flow it can be correct that an agent will fail to send exec stats before the kelvin completes the query. In addition, an agent may go away in the middle of the query and that is not an error.
      
      Test Plan: n/a
      
      Reviewers: zasgar, #engineering, philkuz
      
      Reviewed By: #engineering, philkuz
      
      Subscribers: philkuz
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6159
      
      GitOrigin-RevId: 64f54b196b1a16a3f98164f77cff077ed6a4e2bf
      348bf597
    • Michelle Nguyen's avatar
      PP-2189 Reduce calls to GetHostnameIPPairFromPod during endpoints updates · daabc981
      Michelle Nguyen authored
      Summary:
      there was an excessive amount of calls to get GetHostnameIPPairFromPod function whenever an endpoint was updated.
      this function is used to determine which IPs (and therefore which agents) each pod in the endpoint maps to.
      This is because we were trying to use a function that had been created for another purpose: for this endpointUpdate, get the resourceUpdate for this particular host.
      instead, its probably cleaner to write a new function that better fits the purpose: for this endpointUpdate, get all resourceUpdates that should be sent + the hosts those updates it should be sent to
      
      Test Plan: unit test + ran on skaffold with extra logging to make sure we're sending correct updates still
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6161
      
      GitOrigin-RevId: 3b6eb14548c9fafecb215b3a960124ed9043f7cb
      daabc981
    • Zain Asgar's avatar
      Add experimental UDF to perform nslookup · d52f37fd
      Zain Asgar authored
      Summary:
      UDF looks up DNS information. It's experimental b/c we probably want to add some caching if this turn out to be useful.
      
      ```
      df.hostname = px.nslookup(df.ip_add)
      ```
      
      Test Plan: N/A, will add tests if this is useful.
      
      Reviewers: michelle, nserrino, #engineering
      
      Reviewed By: michelle, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6160
      
      GitOrigin-RevId: cef51528f45300137d5314ce1f380825937fe551
      d52f37fd
  8. 03 Sep, 2020 1 commit
    • Michelle Nguyen's avatar
      Run etcd Gets with serializable option · ad08b896
      Michelle Nguyen authored
      Summary: etcd reads are faster if we don't need to ensure linearizability. since we have a single replica, we shouldnt need linearizability
      
      Test Plan:
      ran skaffold and a bunch of queries... it doesnt seem to break anything.
      testing the timing is harder since we should probably run a real benchmark for that
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6157
      
      GitOrigin-RevId: ea207be6003ba62bba17de71d4cee10001e22e4c
      ad08b896
  9. 04 Sep, 2020 1 commit
    • Michelle Nguyen's avatar
      Rm updatedAgentsMutex dependency on etcd calls · d31ff9c1
      Michelle Nguyen authored
      Summary:
      whenever we have a lock, it should not have to wait on particularly slow operations, such as etcd.
      i updated most of the wrappers so that the lock is acquired after the metadata operation.
      
      @nserrino, you probably know this area best:
      the one that probably needs most review is the GetAgentUpdates fix, where we pull out the updates and clear the updatedAgents before making metadata calls.
      one condition I could imagine that can happen now is:
      - readInitialState is true
      - we start to read the agents/agentData from etcd
      - in the middle, we have other agent updates which are written to updatedAgents
      - we may or may not read these new updates from etcd, depending on timing
      - the next time GetAgentUpdates is called, we may read updates in updatedAgents that were already sent previously. will the query broker accept this?
      
      Test Plan: ran in skaffold and unit tests, things seem to still work
      
      Reviewers: nserrino, zasgar, #engineering
      
      Reviewed By: nserrino, zasgar, #engineering
      
      Subscribers: nserrino
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6152
      
      GitOrigin-RevId: 0f55a30d257156f98053c227868b99f2e32f91cf
      d31ff9c1
  10. 03 Sep, 2020 5 commits
    • Natalie Serrino's avatar
      Fix cli case where cluster ID is not specified. · 1252a2f2
      Natalie Serrino authored
      Summary:
      Latest changes to support an updated Live View link (include cluster name) actually broke the case where the user doesn't provide the cluster id via -c in the CLI.
      This diff fixes that case so that the Live View link will always print the right cluster ID, even if it is a randomly selected one, and also it won't error out.
      
      Test Plan: ran the cli
      
      Reviewers: jamesbartlett, zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6155
      
      GitOrigin-RevId: 6f0dbc458adb8ce2caf97124a7ffb0f6153cbd88
      1252a2f2
    • Michelle Nguyen's avatar
      Update etcd compaction scheme · d581dacc
      Michelle Nguyen authored
      Summary: we want to compact more often so that we can save space on etcd
      
      Test Plan: created an rc and deployed both the etcd statefulset and operator. things still appear to work after a compaction has occurred.
      
      Reviewers: zasgar, jamesbartlett, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6154
      
      GitOrigin-RevId: a07c305a8e387f7d3590c954ef6fb1c62f71ec88
      d581dacc
    • Natalie Serrino's avatar
      PP-2188: disable preset queries test · f3a5123e
      Natalie Serrino authored
      Summary: We need to update the test to fetch the pxl scripts from the github repo, where they got moved to.
      
      Test Plan: n/a
      
      Reviewers: zasgar, oazizi, #engineering
      
      Reviewed By: zasgar, #engineering
      
      JIRA Issues: PP-2188
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6153
      
      GitOrigin-RevId: d18ac572866d040cb0386d8bd39e06c8796db48e
      f3a5123e
    • Michelle Nguyen's avatar
      Handle agent register/heartbeat messages concurrently · cb5fcdf7
      Michelle Nguyen authored
      Summary:
      Before this was all done sequentially, so it was possible for one agent's heartbeat to block another agent's heartbeat/register request.
      Now, we process each agent's message in separate goroutines.
      
      Test Plan:
      unit test
      also ran on skaffold and added logs to track everything going on. it appears to work as expected... but I am also only running 2 pems + 1 kelvin.
      
      Reviewers: zasgar, nserrino, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6149
      
      GitOrigin-RevId: c6af9306187a35e9e191da349a9b0ab6d7967ef1
      cb5fcdf7
    • Michelle Nguyen's avatar
      Auth0 should always prompt user to choose account · eef086f6
      Michelle Nguyen authored
      Summary: if an existing session already exists, auth0 just automatically logs the user in. however, this makes it impossible for the user to change their account if they want to. instead, it should ask them which account they want to choose.
      
      Test Plan: ran webpack
      
      Reviewers: jamesbartlett, #engineering, zasgar
      
      Reviewed By: #engineering, zasgar
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6150
      
      GitOrigin-RevId: bbc511618a648c9bb7ec8793760ee6d5a663d4ba
      eef086f6
  11. 27 Aug, 2020 1 commit
  12. 31 Aug, 2020 1 commit
  13. 02 Sep, 2020 1 commit
  14. 31 Aug, 2020 1 commit
    • Michelle Nguyen's avatar
      Fix logcollector index in staging/prod · 70945d60
      Michelle Nguyen authored
      Summary:
      I put out a previous diff that should fix the index comparison issue in log-collector, and tested on plc-dev.
      after landing, i tried it out on staging and was surprised to see the log-collector was still complaining about reindexing.
      looking at the elastic settings, i saw the log indices on plc-staging and plc both had 1 shard and 1 replica only, which was why the index comparison was still failing.
      renaming the index once again so that we don't lose our previous logs, but we can start up an index with the correct settings.
      
      Test Plan: n/a
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6135
      
      GitOrigin-RevId: 50bc0f28ba2e69ea451d6c8105eeaa53514fcf40
      70945d60
  15. 02 Sep, 2020 1 commit
  16. 01 Sep, 2020 2 commits
    • Natalie Serrino's avatar
      Fix bug in UI where Execution Stats tab didn't work when clicked · dce7b2bd
      Natalie Serrino authored
      Summary: Fixing this to help with investigating streaming queries. When the active tab name wasn't set to a table name, we would reset it to the first table. this behavior didn't account for the fact that the stats tab was also a valid selection, so that logic is now incorporated.
      
      Test Plan: ran the ui
      
      Reviewers: michelle, philkuz, #engineering
      
      Reviewed By: philkuz, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6141
      
      GitOrigin-RevId: 9d10c701a7a47e3bb074f3a192098e4165de957e
      dce7b2bd
    • Natalie Serrino's avatar
      Fix live view links in CLI to be cluster-specific. · 550abb7e
      Natalie Serrino authored
      Summary: While trying to debug whether or not there are streaming issues in the CLI, I noticed that our live view links are out of date, so they are updated now.
      
      Test Plan: ran px run and px live and tried the outputted urls.
      
      Reviewers: michelle, zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6139
      
      GitOrigin-RevId: 29b0dd12dee4433f12264812fcc24044625954ff
      550abb7e
  17. 02 Sep, 2020 1 commit
  18. 28 Aug, 2020 1 commit
  19. 01 Sep, 2020 1 commit
    • Phillip Kuznetsov's avatar
      Add single cluster support for exectime · 66e184fa
      Phillip Kuznetsov authored
      Summary:
      Same args as `px run`
      ```
      bazel run -c opt //src/e2e_test/vizier/exectime:exectime -- -c 4388aa1e-1666-48f8-9cf7-437650e62255
      ```
      
      Test Plan: Tested with single cluster and multi-cluster
      
      Reviewers: nserrino, zasgar, jamesbartlett, #engineering
      
      Reviewed By: nserrino, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6140
      
      GitOrigin-RevId: ddcbc99e8d81f7fdbbcf922cfbbeeae2ff63239b
      66e184fa
  20. 31 Aug, 2020 1 commit
  21. 28 Aug, 2020 1 commit
    • James Bartlett's avatar
      Fix bug with limit producing empty row batches. · 4444a954
      James Bartlett authored
      Summary:
      A while back, I was trying to fix a bug with Limit nodes when there were 2 separate graphs one limit would prevent the other graph from running. In fixing that bug, I introduced a new bug with limits that have multiple sources (i.e. a limit after a union), where the limit would output empty row batches for each of the other sources if one of the sources was enough to reach the limit. (I discovered this b/c these empty row batches each of eos set, causing an agg after the limit to output 1 row for each row batch)
      
      This reverts my old fix and adds a new fix that works in both cases.
      
      Test Plan: I added tests for both cases, as expected the second case's test fails on master, and both pass with this diff.
      
      Reviewers: #engineering, philkuz
      
      Reviewed By: #engineering, philkuz
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6118
      
      GitOrigin-RevId: b20f695ef286c14880b765e791bb1bc27ea62ff4
      4444a954
  22. 31 Aug, 2020 6 commits
    • Yaxiong Zhao's avatar
      Change dynamic tracing IR protobufs visibility to stirling only · fe4fd24b
      Yaxiong Zhao authored
      Test Plan: Jenkins
      
      Reviewers: oazizi, #engineering
      
      Reviewed By: oazizi, #engineering
      
      Subscribers: philkuz
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6127
      
      GitOrigin-RevId: a13bbc51b798f4bd4ef42d30a5594ce801bedce6
      fe4fd24b
    • Yaxiong Zhao's avatar
      Testing artifacts for minikube with kvm driver · ccd59cac
      Yaxiong Zhao authored
      Summary: Yamls for go_grpc_client and server
      
      Test Plan: Manual
      
      Reviewers: oazizi, #engineering
      
      Reviewed By: oazizi, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6111
      
      GitOrigin-RevId: 8b2f1f635e8356a398442cba7ead32b6d28c4428
      ccd59cac
    • Omid Azizi's avatar
      Manual bazel tests for px on GKE · 6f2a8393
      Omid Azizi authored
      Summary: Encode the steps of testing different environments into scripts.
      
      Test Plan: Manual
      
      Reviewers: yzhao, #engineering
      
      Reviewed By: yzhao, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6121
      
      GitOrigin-RevId: d5688e128de314c5dd717dc9661281c4103e99e2
      6f2a8393
    • Omid Azizi's avatar
      Move parts of String/ByteArray processing from CodeGen to Dwarvifier · ad2cb1ca
      Omid Azizi authored
      Summary:
      String/ByteArray support was done very quickly for the demo, so some short-cuts were taken.
      
      This diff rectifies part of that. It is a step in the right direction; there is more to do.
      
      Test Plan: Existing tests.
      
      Reviewers: yzhao, #engineering
      
      Reviewed By: yzhao, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6123
      
      GitOrigin-RevId: 87d558c15c54abb3a61221b0e052b151933773ac
      ad2cb1ca
    • Michelle Nguyen's avatar
      Fix updater timeout commit · 11bc6f4d
      Michelle Nguyen authored
      Summary: the wrong version of when I was testing different ways to handle the timeout error got committed instead of the actual solution that was in the diff.
      
      Test Plan: n/a
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6132
      
      GitOrigin-RevId: bf22569b64cf79e4655d3ba659c9766c5073a7a7
      11bc6f4d
    • Michelle Nguyen's avatar
      Update job: Keep cloud-connector alive · dcd9b993
      Michelle Nguyen authored
      Summary:
      We want to keep the cloud connector alive as much as possible, so that we always have a way to recover without requiring a fresh reinstall from our customers.
      The current update flow is:
      - delete all vizier resources (this includes cloudconn + its deps, minus etcd/nats)
      - launch new versions of all vizier resources
      In between the time of deletion and launching, the cloud connector deployment is completely gone from the namespace. if something hapepns in that time, it is possible that we may never get the cloud connector back.
      
      Instead, the flow is updated to be more like this:
      - delete all vizier resources (minus cloudconn + its deps)
      - launch new versions of all vizier resources. for cloudconn + its deps, this is an update.
      Now, there is no period of time where there is no cloud connector deployment.
      
      We need to additionally bounce the cloudconnector pod to handle a case where we try to update to the same version we're already on. since this is an update now, the cloudconnector pod just keeps running. however, we expect the cloudconnector to clean up the update job upon startup.
      
      Test Plan:
      tried 4 cases:
      - update 0.4.5 to RC with new update changes
      - deploy RC with update changes
      - update RC to same RC version
      - update RC to newer RC version
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6129
      
      GitOrigin-RevId: 365e9e74707e5d89a7f9c8794f8830635cf524fb
      dcd9b993
  23. 28 Aug, 2020 2 commits
    • Michelle Nguyen's avatar
      Fix updater timeout issue · 7ea98e01
      Michelle Nguyen authored
      Summary:
      turns out the previous fix didn't work, because the error wasn't actually the type I expected it to be.
      turns out theres no specific error type that we can check for this timeout error, so we have to check it manually.
      
      Test Plan: updated the job timeout to be 2s instead of 10 mins. confirm that my cluster updated properly anyway
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6112
      
      GitOrigin-RevId: 5eaebfd5049f1ff5ab9e4badce9ea8d19ff7c406
      7ea98e01
    • Michelle Nguyen's avatar
      Fix create release script bazel path · b2bb3e7e
      Michelle Nguyen authored
      Summary:
      when we made our vizier images public/private we removed :vizier_images_bundle and replaced it with :public_vizier_images_bundle and :private_vizier_images_bundle.
      This broke the release script for rcs because it couldn't properly get the number of changed commits and the name would always be something like 0.4.6-pre-master.0, which made creating multiple rcs from the same branch very annoying
      
      Test Plan: ran it
      
      Reviewers: zasgar, #engineering
      
      Reviewed By: zasgar, #engineering
      
      Differential Revision: https://phab.corp.pixielabs.ai/D6126
      
      GitOrigin-RevId: 3bae66f02ca08924a909875440547a2567c27554
      b2bb3e7e