For a comprehensive overview of their codebases, an engineering leader utilizes a large set of tools. Typically, there’s software to inspect code quality, including identifying bugs, vulnerabilities, code smells, and technical debt. Another set of tools focuses on real-time insights into application performance, including monitoring logs. Depending on the apps, there may also be tools that analyze the user experience. There may be dependency tracking and management tools as well, not to mention external sources of data, such as product documentation and third party lists of vulnerable libraries. All of these tools spit out reports, but an engineering manager doesn’t want a report. They want answers to their pressing questions, i.e. what they would get if they asked a senior engineer to tell them about important parts of their codebase. In other words, they have questions for which they want answers.
These tools produce a deluge of information. Just finding and tracking one module of code through all of the artifacts is burdensome, never mind reconciling the different formats. For individual engineers, CI/CD integrations focus the tools’ reports on the code in the PR. But the engineering leader needs to have a holistic view of the entire codebase. Rather than having deep questions about a small subset of the repos, they have broad questions across the repos. This task either needs a highly-trained engineer or a fair amount of post-processing code to be written to conform and extract the relevant data—both options being expensive.
To scaffold our discussion of the holistic questions, we will use a two-dimensional taxonomy. In one dimension, we will consider the input data sources; in the other, we will consider the method of answering them, simplified in this case to whether an LLM is employed. We will use the question of third party library imports to illustrate how the taxonomy represents various questions.
One category can be answered using static analysis tools. For example, if a leader wants to find dependencies between modules, they can use one of a number of tools. Likely, to get a holistic view across their organization’s repos, they will need to use several tools, as most only cover a small subset of programming languages. If the report comes back “clean,” this is likely sufficient. However, in any large enough organization, there are likely to be some red flags (blame the intern). To gauge the risk and impact, the leader needs to know how and where the libraries are being used. A static analysis tool can point them toward the relevant code, but not describe the code employing the library.
In many cases, to get a truly meaningful answer, the leader will want to enhance the static analysis answer with summarization from an LLM. For example, while a simple static analysis report can list third party imports, when provided with the relevant code snippets, the LLM can determine the function of the code and give a much richer picture.
Likewise, additional code-adjacent data sources can provide more context than code alone. Generic data sources, external to the organization, can be quite useful. For example, a list of licenses, like libraries.io, allows those libraries to be put in context, including information about licenses, deprecation, etc. Similarly, augmenting the code with information about the system architecture allows a much deeper view of code functionality. Architectural details may be pulled directly from the code, e.g. from Terraform, from external documentation, or manually entered by engineers. Likewise, performance information from third party tools can tie monitoring and logging information to code. Lastly, some information may be institutional knowledge from external sources, such as Jira or external product documentation. No matter how the external data gets into the system, a human should be able to curate and confirm it before and after it is passed to the LLM.
Because the LLM is able to do simple linkage between data sources (e.g. identifying corresponding fields even if they have different names or formats), ingesting additional data sources is much easier than it would be using traditional approaches. In some cases, the LLM can even perform advanced tasks like semantic linking, data cleansing, and entity resolution, although the results should be closely curated by a human to ensure good LLM behavior. This is particularly true if a question straddles multiple taxonomy categories, pulling in multiple code-adjacent data sources.
However, while integrating data is helpful, it doesn’t completely solve the problem for the engineering manager, who still needs to find the high-level signal in all the low-level noise. Flux is the agent that discovers which alerts are most important and urgent from the tons spit out by the tools, and organizes the responses by topic area. For example, orthogonal to the programming language and tools, all of the code quality information is in one place, in one format. Flux also highlights the most important things for a manager to know in an executive summary.
The responsible engineering manager doesn’t just find the problems, they figure out how to solve them. They want to dig deeper to understand the extent of the problem. For example, static analysis tools can tell them which libraries in the code are vulnerable. But Flux can explain where in the code it is used, the function of that code, and how difficult it would be to remediate.
To learn more, book a demo with a member of our team today!
Rachel Lomasky is the Chief Data Scientist at Flux, where she continuously identifies and operationalizes AI so Flux users can understand their codebases. In addition to a PhD in Computer Science, Rachel applies her 15+ years of professional experience to augment generative AI with classic machine learning. She regularly organizes and speaks at AI conferences internationally - keep up with her at her LinkedIn here.