Show HN: Musoq – Query Anything with SQL Syntax (Git, C#, CSV, Can DBC)

github.com

56 points · Puchaczov · 11 days ago

Hey, For those of you who don't know my little tool Musoq, I wanted to introduce it as a small tool that allows you to query with SQL-like syntax without any database.

It allows you to query various things from niche ones like CAN DBC files, weird ones like C# code, interesting ones with Git querying to regular stuff like CSV, TSV and various others.

I am quite a bit experimenting with various things so I'm hybridizing the engine with LLMs or doing other weird stuff that are more or less practical :-)

I wanted also to share some recent developments in this little project as I hope it might be interesting to some of you.

New Experimental Plugins: * Git Plugin (Beta): I've been working on Git repository querying - managed to test it on the EF Core repo (16k commits) and it seems to work okay * Roslyn Plugin (Beta): Added basic C# code analysis capabilities

For the very first time: I've extended CROSS APPLY to use computed results as arguments! Now the operator can use values from the current row as inputs. Here's an example:

  SELECT
    f.DirectoryName,
    f.FileName
  FROM #os.directories('/some/path', false) d
  CROSS APPLY #os.files(d.FullName, true) f
  WHERE d.Name IN ('Folder1', 'Folder2')
After another pack of fixes I'm finally able to query multiple git repositories AT ONCE!

  with ProjectsToAnalyze as (
    select
        dir2.FullName as FullName
    from #os.directories('D:\repos', false) dir1
    cross apply #os.directories(dir1.FullName, false) dir2
    where
        dir2.Name = '.git'
  )
  select
    c.Message,
    c.Author,
    c.CommittedWhen
  from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r 
  cross apply r.Commits c
  where c.AuthorEmail = 'my-email@email.ok'
  order by c.CommittedWhen desc
Under the Hood: - Added a Buckets feature for memory management (currently just testing it with the Roslyn plugin)

- Moved to .NET 8

- Added CROSS/OUTER APPLY operators

- Made some improvements to error messages and runtime behavior

New piping features: I've been experimenting with piping capabilities: * Image Analysis with LLMs:

  ./Musoq.exe image encode "image.jpg" | ./Musoq.exe run query "select s.Shop, s.ProductName, s.Price from ..."
* Text Data Extraction:

  Get-Content "ticket.txt" | ./Musoq.exe run query "select t.TicketNumber, t.CustomerName ... from #stdin.text('Ollama', 'llama3.1') t"
* Data Source Combination:

  { docker image ls; ./Musoq.exe separator; docker container ls } | ./Musoq.exe run query "..."
I'm working on comprehensive documentation: I encourage you especially to look at section "Practical Examples and Applications" and "Data Sources" where you can look at all the tables the tool currently provides. <https://puchaczov.github.io/Musoq/>

Other Changes:

- Made some improvements to OS and Archive data sources (OS can now query metadata like EXIF)

- Added a few fields to CAN DBC plugin

- Command outputs can now be used as inputs for queries

I'm hoping to:

- Improve stability and add more tests

- Flesh out the documentation

- Work on package distribution (Scoop, Ubuntu packages)

- Share some examples of source code querying with Roslyn

Ideas for later:

- WHERE robust analysis and optimizations

- DISTINCT operator implementation

- PROTOBUF schema support

- Performance improvements

- Query parallelization

- Recursive CTEs

- Subqueries

I'd really appreciate any thoughts or feedback!

The documentation section where I write a short analysis of EF Core with git plugin: <https://puchaczov.github.io/Musoq/practical-examples-and-app...>


15 comments
lathiat · 11 days ago
This looks awesome. As someone working in support for a wide array of Linux Apps, and data dumps from customers where I have no access to the system, plus I also write or backport bug fixes to all sorts of random software, I often want to do this kind of crazy stuff. With exactly these kinds of artefacts.

Show replies

snthpy · 11 days ago
Very cool!

How does this interface with the different tools and how would one add another tool for it to operate on?

I started on something similar last year which was just a simple bash script to interact with things like osquery. Alas it was too buggy for what I wanted to do and it's paused indefinitely for now.

Show replies

cryptoalex · 11 days ago
Hey I like your project, earned a star from me! When time allows, will take it on a test drive to see how exactly it works with Roslyn/C# data. My C# solution has grown to about 80 projects so it would be good test.

Show replies

re_spond · 11 days ago
Where would you place this between osquery and steampipe? It seems to borrow concepts on both sides, but I'm not sure how it could not be plugin for either.

Show replies

johnthescott · 10 days ago
in postgresql syntax for data types is stored in the database. how would this tool parse gis expressions, for example?

Show replies