feat: add whitelist char support to nonascii check #100

Merged
manuel merged 5 commits from feat_repo-healthcheck-whitelist-char into master 2026-04-26 08:23:34 +08:00

This commit brings support of whitelisted characters during repo healthcheck/non-ascii file check. Supported by an extra switch to repo-health-checker, -whitelistedChars. The argument takes a comma-separated list of non-ASCII characters and ignores them during repo healthcheck. Illegal cmdline input is logged by the logger.

Co-Authored-By: GitHub Copilot noreply@microsoft.com

Copilot Prompt
This is a repo for an online judge orchestrator system «JOJ3». Under `cmd/` lies a source directory for a Go command, `repo-health-checker`. You tell from its name that it checks the repo for stuff like repo size, commit message, non-ASCII character usage, etc. before sending the work to the actual judging and grading system.

Now, I want the non-ASCII character checking function of the repo health checker to be flexible - it shall accept a list of non-ASCII characters and deem them acceptable.

Your task

  • Accept this new cmdline arg. In cmd/repo-health-checker/main.go, accept a new command line flag -whitelisted-chars, which shall take exactly one string of comma-separated non-ASCII characters. This string shall be passed to the actual healthcheck package.
  • Respect this list while scanning the files. In pkg/healthcheck/nonascii.go, function getNonASCII(), we utilize a bufio Scanner to scan through all files for non-ASCII characters. We would like the list of acceptable chars to be passed from the cmdline to here, and modify the scanner logic to actually accept the corresponding characters.
  • Error handling and reporting. This command line arg, -whitelisted-chars, could be completely abscent; in which case, no characters shall be escaped by default. The comma-separated list passed to the command may contain ASCII characters or multiple characters that are not properly separated; in which case, ignore that element, and report the incident via the SLog logging framework used in this project.
  • Test your work. Create new testcases under examples/healthcheck/ to reflect this change. Reflect to examples/healthcheck/asciifile/ to learn about how to configure the repo health checker. Integrate your work to the Go test framework such that it could be invoked by running make test at the terminal.
    • Note: Use git init to init your testcase directory and make a initial commit - this project, JOJ3, only runs in Git repos.

Notes

  • Directory structure. cmd/ for invokable commands, pkg/ for the actual logic, internal - something you don't need to worry about.
  • JOJ3 vs. Health Check. joj3 is a separate executable; in this session we are only working on the repo-health-checker.
  • Extras. Make sure to read README.md and the directory structure before you go; also, create To-do before you execute your plan.
This commit brings support of whitelisted characters during repo healthcheck/non-ascii file check. Supported by an extra switch to `repo-health-checker`, `-whitelistedChars`. The argument takes a comma-separated list of non-ASCII characters and ignores them during repo healthcheck. Illegal cmdline input is logged by the logger. Co-Authored-By: GitHub Copilot <noreply@microsoft.com> <details> <summary>Copilot Prompt</summary> <br> This is a repo for an online judge orchestrator system «JOJ3». Under `cmd/` lies a source directory for a Go command, `repo-health-checker`. You tell from its name that it checks the repo for stuff like repo size, commit message, non-ASCII character usage, etc. before sending the work to the actual judging and grading system. Now, I want the non-ASCII character checking function of the repo health checker to be flexible - it shall accept a list of non-ASCII characters and deem them acceptable. ## Your task - Accept this new cmdline arg. In `cmd/repo-health-checker/main.go`, accept a new command line flag `-whitelisted-chars`, which shall take exactly one string of comma-separated non-ASCII characters. This string shall be passed to the actual healthcheck package. - Respect this list while scanning the files. In `pkg/healthcheck/nonascii.go`, function `getNonASCII()`, we utilize a bufio *Scanner* to scan through all files for non-ASCII characters. We would like the list of acceptable chars to be passed from the cmdline to here, and modify the scanner logic to actually accept the corresponding characters. - Error handling and reporting. This command line arg, `-whitelisted-chars`, could be completely abscent; in which case, no characters shall be escaped by default. The comma-separated list passed to the command may contain ASCII characters or multiple characters that are not properly separated; in which case, ignore that element, and report the incident via the SLog logging framework used in this project. - Test your work. Create new testcases under `examples/healthcheck/` to reflect this change. Reflect to `examples/healthcheck/asciifile/` to learn about how to configure the repo health checker. Integrate your work to the Go test framework such that it could be invoked by running `make test` at the terminal. - Note: Use `git init` to init your testcase directory and make a initial commit - this project, JOJ3, only runs in Git repos. ## Notes - Directory structure. `cmd/` for invokable commands, `pkg/` for the actual logic, `internal` - something you don't need to worry about. - JOJ3 vs. Health Check. `joj3` is a separate executable; in this session we are only working on the `repo-health-checker`. - Extras. Make sure to read `README.md` and the directory structure before you go; also, create To-do before you execute your plan. </details>
王韵晨520370910012 added 1 commit 2026-04-08 11:56:48 +08:00
feat: add whitelist char support to nonascii check
Some checks failed
build / build (push) Failing after 9m18s
build / trigger-build-image (push) Has been skipped
build / build (pull_request) Failing after 11m13s
build / trigger-build-image (pull_request) Has been skipped
6496435891
This commit brings support of whitelisted characters during repo
healthcheck/non-ascii file check. Supported by an extra switch to
`repo-health-checker`, `-whitelistedChars`. The argument takes a
comma-separated list of non-ASCII characters and ignores them during
repo healthcheck. Illegal cmdline input is logged by the logger.
王韵晨520370910012 added the
enhancement
component
framework
labels 2026-04-08 11:57:58 +08:00
王韵晨520370910012 added 1 commit 2026-04-08 12:24:59 +08:00
chore: add git submodules for nonascii tests
Some checks failed
build / trigger-build-image (push) Blocked by required conditions
build / build (pull_request) Failing after 14m15s
build / trigger-build-image (pull_request) Has been skipped
build / build (push) Failing after 28m27s
8e8719d80b
王韵晨520370910012 requested review from manuel 2026-04-08 12:25:29 +08:00
王韵晨520370910012 requested review from 张泊明518370910136 2026-04-08 12:25:30 +08:00
王韵晨520370910012 changed title from feat: add whitelist char support to nonascii check to WIP: feat: add whitelist char support to nonascii check 2026-04-08 12:38:52 +08:00
王韵晨520370910012 removed review request for manuel 2026-04-08 12:38:57 +08:00
王韵晨520370910012 removed review request for 张泊明518370910136 2026-04-08 12:38:58 +08:00
王韵晨520370910012 added 1 commit 2026-04-08 14:08:45 +08:00
fix: invalid test cases
Some checks failed
build / build (push) Failing after 29m41s
build / build (pull_request) Failing after 29m39s
build / trigger-build-image (push) Has been cancelled
build / trigger-build-image (pull_request) Has been cancelled
2a501f7cf6
This commit fixes two test cases:
- Whitedlisted chars (success), where the config and expected jsons are
  misconfigured;
- Whitelisted chars (invalid). This test was removed. Since stedrr is
  preserved with execution in sadnbox, and while stderr contains
  "original" bad non-ASCII characters that are filtered, this creates a
  paradox. Thus, the test case is removed for now, pending investigation
  into this matter.
王韵晨520370910012 changed title from WIP: feat: add whitelist char support to nonascii check to feat: add whitelist char support to nonascii check 2026-04-08 14:09:08 +08:00
王韵晨520370910012 requested review from manuel 2026-04-08 14:09:14 +08:00
王韵晨520370910012 requested review from 张泊明518370910136 2026-04-08 14:09:15 +08:00

Since it just reads comma seperated strings, I think we can just pass it via []string, and no need to name it as xxxCSV as it is a bit confusing?

Since it just reads comma seperated strings, I think we can just pass it via `[]string`, and no need to name it as `xxxCSV` as it is a bit confusing?
王韵晨520370910012 added 1 commit 2026-04-17 23:28:27 +08:00
chore: rename whitelistedCharsCSV to whitelistedChars
Some checks failed
build / build (pull_request) Failing after 29m57s
build / build (push) Failing after 4m11s
build / trigger-build-image (push) Has been skipped
build / trigger-build-image (pull_request) Has been cancelled
8b76780c98
Author
Member

Since it just reads comma seperated strings, I think we can just pass it via []string, and no need to name it as xxxCSV as it is a bit confusing?

Modified - removed "CSV" from "whitelistedCharsCSV". Note that it's passing a string, not a []string since it's from the command line.
Also note that in function parseWhitelistedChars, variable name "csv" is kept as-is to reflect the fact that it's a comma-separated list.

> Since it just reads comma seperated strings, I think we can just pass it via `[]string`, and no need to name it as `xxxCSV` as it is a bit confusing? Modified - removed "CSV" from "whitelistedCharsCSV". Note that it's passing a `string`, not a `[]string` since it's from the command line. Also note that in function `parseWhitelistedChars`, variable name "csv" is kept as-is to reflect the fact that it's a comma-separated list.
张泊明518370910136 approved these changes 2026-04-18 14:08:34 +08:00
Dismissed
张泊明518370910136 left a comment
Owner

LGTM

LGTM
Author
Member

@bomingzh Thanks for the review :)
However I don't have the permission to merge :(

@bomingzh Thanks for the review :) However I don't have the permission to merge :(

:-( I merged it with wrong order, it has conflicts now.

:-( I merged it with wrong order, it has conflicts now.
王韵晨520370910012 added 1 commit 2026-04-23 06:31:09 +08:00
Merge branch 'master' into feat_repo-healthcheck-whitelist-char
Some checks failed
build / build (pull_request) Failing after 27m5s
build / trigger-build-image (pull_request) Has been cancelled
build / build (push) Successful in 7m25s
build / trigger-build-image (push) Has been skipped
032f4d18df
王韵晨520370910012 dismissed bomingzh’s review 2026-04-23 06:31:09 +08:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Merge conflict fixed

Merge conflict fixed
张泊明518370910136 approved these changes 2026-04-25 11:58:17 +08:00
张泊明518370910136 removed review request for manuel 2026-04-25 12:06:25 +08:00
manuel merged commit 04ae1c8674 into master 2026-04-26 08:23:34 +08:00
Sign in to join this conversation.
No description provided.