Recording fixes: - Fix esp_codec_dev_read() return value handling (returns error code, not byte count) which caused infinite spin loop - Fix storage s_mounted flag set before mkdir succeeds - Add CONFIG_FATFS_LFN_HEAP=y for long filename support (>8.3 chars) - Make audio_recorder_stop() non-blocking to prevent LVGL deadlock - Add error callback to reset UI when recording task fails - Wait for playback to finish before opening mic codec (shared I2S bus) - Reset timer to 00:00:00 when stop is pressed Playback and UI fixes: - Make audio_player_stop() non-blocking (same deadlock pattern as recorder) - Add volume slider to library player panel, default volume 90% - Remove software box shadow from record button — lv_draw_sw_box_shadow exceeds the 5-second task watchdog on ESP32-S3, preventing LVGL from completing a single frame render (white screen) - Style tabview internal content container to prevent white background - Remove LV_OBJ_FLAG_CLICKABLE/SCROLLABLE from layout containers that were swallowing touch events in the recordings list - Stop any active playback when record button is pressed README: document shadow rendering watchdog issue, codec API gotchas, FAT LFN requirement, LVGL deadlock pattern, and touch event propagation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| main | ||
| .gitignore | ||
| CMakeLists.txt | ||
| DISPLAY_PATCHES.md | ||
| partitions.csv | ||
| README.md | ||
| sdkconfig.defaults | ||
ESP32-S3 Voice Recorder
A portable voice recorder built on the Waveshare ESP32-S3-Touch-AMOLED-2.06 board. Records audio to SD card as WAV files with a touch-driven LVGL UI for recording, playback, and WiFi connectivity.
This README serves as a blueprint for building ESP-IDF applications on this board, documenting every sticking point encountered during development.
Features
- Record voice notes to WAV files on microSD card (16kHz, 16-bit mono)
- Browse and play back recordings through the onboard speaker
- Real-time audio level meter during recording
- WiFi connectivity with network scanning and NTP time sync
- Dark-themed touch UI optimized for the 410x502 rounded AMOLED display
Hardware
| Feature | Detail |
|---|---|
| Board | Waveshare ESP32-S3-Touch-AMOLED-2.06 |
| MCU | ESP32-S3 (dual-core Xtensa LX7, 240 MHz) |
| Flash | 16 MB QSPI |
| PSRAM | 8 MB Octal |
| Display | 2.06" 410x502 AMOLED, SH8601 controller, QSPI interface |
| Touch | FT5x06 capacitive (I2C) |
| Audio codec | ES8311 (I2C control, I2S data) |
| Speaker amp | PA enable on GPIO 46 |
| SD card | SDMMC 1-bit mode |
| Console | USB Serial JTAG |
Pin map (from BSP)
Display (QSPI): CS=12, PCLK=11, D0=4, D1=5, D2=6, D3=7, RST=8
Touch (I2C): SDA=15, SCL=14, INT=38, RST=9 (shared with LCD RST)
Audio (I2S): MCLK=16, SCLK=41, LCLK=45, DOUT=40, DIN=42
Speaker PA: GPIO 46 (active high)
SD card (SDMMC): CLK=2, CMD=1, D0=3
Project structure
├── CMakeLists.txt # Top-level ESP-IDF project file
├── partitions.csv # Custom partition table (3 MB app)
├── sdkconfig.defaults # Build configuration
├── DISPLAY_PATCHES.md # Critical BSP bug documentation
├── main/
│ ├── idf_component.yml # Component dependencies
│ ├── main.c # App entry point
│ ├── storage.c/h # SD card, WAV file I/O
│ ├── audio_recorder.c/h # Microphone capture to WAV
│ ├── audio_player.c/h # WAV playback with PA control
│ ├── wifi_manager.c/h # WiFi STA, scan, NVS credentials, SNTP
│ ├── ui.c/h # LVGL tab view shell
│ ├── ui_record.c/h # Record tab (button, timer, level meter)
│ ├── ui_recordings.c/h # Library tab (file list, player controls)
│ └── ui_wifi.c/h # WiFi tab (scan, connect, keyboard)
└── managed_components/ # Downloaded at build time (gitignored)
Building and flashing
Prerequisites
- ESP-IDF v5.4 (
source ~/esp/esp-idf/export.sh) - Python 3.12+ (used by ESP-IDF build system)
Install ESP-IDF
brew install cmake ninja dfu-util
mkdir -p ~/esp && cd ~/esp
git clone -b v5.4 --recursive https://github.com/espressif/esp-idf.git
cd esp-idf && ./install.sh esp32s3
First build
source ~/esp/esp-idf/export.sh
idf.py set-target esp32s3
idf.py build
The first build downloads managed components (BSP, LVGL, drivers) into
managed_components/. You must apply display patches after this step — see
Critical: Display patches
below.
Flash and monitor
idf.py -p /dev/cu.usbmodem101 flash monitor
The USB serial port may appear as /dev/cu.usbmodem101 or /dev/cu.usbmodem1101
depending on USB enumeration. Check with ls /dev/cu.usb*. If the device is in a
boot loop, the port may disappear and reappear — hold BOOT while pressing RESET to
enter download mode.
Usage
- Insert a microSD card (FAT32 formatted)
- Power on the device — it boots to the Record screen
- Record: Tap the red button to start/stop recording
- Library: Swipe to the Library tab to browse recordings, tap to play
- WiFi: Swipe to WiFi tab to connect (enables NTP time sync for filenames)
Critical: Display patches required after every clean build
The Waveshare BSP component has two bugs that cause the display to show white bars,
a white screen, or no output at all. These must be patched in managed_components/
after any clean build or component update. Full root-cause analysis is in
DISPLAY_PATCHES.md.
Quick patch steps
After idf.py build downloads managed components, edit one file:
File: managed_components/waveshare__esp32_s3_touch_amoled_2_06/esp32_s3_touch_amoled_2_06.c
In the bsp_display_lcd_init() function, find the lvgl_port_display_cfg_t disp_cfg
struct and change:
// In .flags:
.sw_rotate = false, // was true — saves 41KB rotation buffer
.buff_dma = true, // was false — forces allocation from internal DMA memory
// Then DELETE the entire lvgl_port_display_rgb_cfg_t rgb_cfg struct.
// REPLACE:
// lv_display_t *disp = lvgl_port_add_disp_rgb(&disp_cfg, &rgb_cfg);
// WITH:
lv_display_t *disp = lvgl_port_add_disp(&disp_cfg);
Then rebuild with idf.py build.
Why these patches are necessary
Bug 1 — Wrong LVGL port function: The BSP calls lvgl_port_add_disp_rgb() which
is for RGB parallel panels. For the SH8601 QSPI panel, this calls
esp_lcd_rgb_panel_register_event_callbacks() which blindly casts the panel handle
via __containerof(panel, esp_rgb_panel_t, base), corrupting memory in the
sh8601_panel_t struct. The correct function is lvgl_port_add_disp(), which
registers an on_color_trans_done SPI IO callback for async flush signaling.
Bug 2 — LVGL draw buffer lands in PSRAM: With buff_dma = false, the buffer is
allocated with MALLOC_CAP_DEFAULT. Because SPIRAM_MALLOC_ALWAYSINTERNAL=4096 and
the buffer is ~82KB, it lands in PSRAM. The ESP32-S3 SPI master driver's
spi_device_queue_trans() calls esp_ptr_dma_capable(), which returns false for
PSRAM. The driver attempts to malloc an 82KB temporary copy from internal DMA memory,
fails, and produces spi transmit (queue) color failed errors.
What does NOT work (and why)
During development we also tried making lv_disp_flush_ready() unconditional in the
LVGL port flush callback (esp_lvgl_port_disp.c). This appeared to fix display
output initially, but caused DMA buffer corruption — LVGL would start rendering the
next band into the buffer while DMA was still transmitting the current band. Symptoms:
brown horizontal lines, pink color banding, image tearing. The correct approach is to
let the on_color_trans_done IO callback (registered by lvgl_port_add_disp())
signal flush completion after DMA finishes.
sdkconfig.defaults — key settings explained
Display buffer height
CONFIG_BSP_DISPLAY_LVGL_BUF_HEIGHT=50
The LVGL draw buffer is 410 * BUF_HEIGHT * 2 bytes (RGB565). The default height of
100 produces an 82KB buffer that doesn't fit in internal DMA memory. Setting it to 50
yields a 41KB buffer that fits. Going lower reduces memory pressure but increases the
number of SPI transactions per frame.
PSRAM configuration
CONFIG_SPIRAM=y
CONFIG_SPIRAM_MODE_OCT=y
CONFIG_SPIRAM_SPEED_80M=y
CONFIG_SPIRAM_BOOT_INIT=y
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=4096
CONFIG_SPIRAM_MALLOC_RESERVE_INTERNAL=65536
SPIRAM_MALLOC_ALWAYSINTERNAL=4096 means any malloc() over 4KB may be placed in
PSRAM. This is the root cause of Bug 2 — the 82KB LVGL buffer gets allocated from
PSRAM, but the SPI DMA controller cannot access PSRAM on ESP32-S3 (no
SOC_PSRAM_DMA_CAPABLE). Always use MALLOC_CAP_DMA or buff_dma = true for any
buffer that will be passed to SPI DMA.
Flash and partition
CONFIG_ESPTOOLPY_FLASHSIZE_16MB=y
CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_PARTITION_TABLE_CUSTOM=y
CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="partitions.csv"
The custom partition table allocates 3 MB for the app, which is needed because LVGL + WiFi + audio produce a ~1.5 MB binary.
Console
CONFIG_ESP_CONSOLE_USB_SERIAL_JTAG=y
This board has no UART-to-USB bridge — console output goes through the USB Serial JTAG peripheral. If the firmware crashes before USB initialization completes, you won't see any output. Use BOOT mode (hold BOOT + press RESET) to recover.
Architecture notes
Audio recording
Recording runs in a dedicated FreeRTOS task on core 1 at priority 5. The BSP's
esp_codec_dev API handles I2S + ES8311 codec setup. Audio is captured at 16 kHz,
16-bit, mono and written directly to SD card as a WAV file. The WAV header is written
with placeholder sizes at the start, then seeked back and finalized when recording
stops.
Key detail: esp_codec_dev_set_in_gain(s_mic_codec, 30.0) sets the microphone input
gain. The ES8311 ADC gain range is codec-dependent — consult the datasheet if the
recording volume is too low or clipping.
Audio playback
Playback also runs on core 1, priority 4 (below recording). The speaker power
amplifier on GPIO 46 must be explicitly enabled before playback and disabled after.
Without pa_enable(true), the speaker produces no sound even though the codec is
sending I2S data.
WiFi and NTP
WiFi credentials are persisted in NVS under the "wifi" namespace. On boot, the
manager checks NVS for saved credentials and auto-connects if found. After connecting,
SNTP time sync runs automatically — this gives recordings proper timestamps instead of
epoch-based filenames.
LVGL display pipeline (QSPI)
Understanding this pipeline is critical for debugging display issues:
LVGL render → flush_callback() → esp_lcd_panel_draw_bitmap()
→ panel_sh8601_draw_bitmap()
→ tx_param(CASET) — sets column range (polls, blocks until previous DMA done)
→ tx_param(RASET) — sets row range
→ tx_color(RAMWR, pixels) — queues SPI DMA transactions
→ on_color_trans_done ISR fires when last DMA chunk completes
→ lv_disp_flush_ready() — tells LVGL the buffer is free
Key invariant: with a single draw buffer, LVGL must not write to the buffer until
lv_disp_flush_ready() is called. If flush_ready is called too early (before DMA
completes), the buffer is corrupted mid-transfer, producing color artifacts and
tearing.
The tx_param() calls at the start of draw_bitmap drain all in-flight DMA
transactions before sending new commands. This means the PREVIOUS transfer's buffer is
safe by the time the NEXT draw_bitmap starts — but NOT before. The IO callback
mechanism correctly waits for DMA completion.
LVGL safe area for rounded display
The 2.06" AMOLED has rounded corners. Without padding, UI elements at the screen edges are clipped. The fix is padding on the root LVGL screen object:
lv_obj_set_style_pad_top(scr, 20, LV_PART_MAIN);
lv_obj_set_style_pad_bottom(scr, 10, LV_PART_MAIN);
lv_obj_set_style_pad_left(scr, 8, LV_PART_MAIN);
lv_obj_set_style_pad_right(scr, 8, LV_PART_MAIN);
Top padding is larger because the display's rounded corners are more pronounced at the top. The bottom tab bar already provides some visual margin from the corner.
Display coordinate offset
The SH8601 panel on this board has an X-axis GRAM offset of 22 pixels:
esp_lcd_panel_set_gap(panel_handle, 0x16, 0); // x_gap=22, y_gap=0
The BSP handles this automatically. If you write a custom panel driver, you must include this offset or the image will be shifted horizontally.
Common errors and solutions
spi transmit (queue) color failed
The LVGL draw buffer is in PSRAM and the SPI DMA controller can't access it. Fix:
set buff_dma = true in the LVGL port display config and reduce buffer height so it
fits in internal SRAM (~41KB max safe). See
Display patches.
White screen / no display output
Cause 1 — BSP patches not applied: Bug 1 (wrong LVGL port function) combined with Bug 2 (PSRAM buffer). All SPI color transfers fail silently. Apply both patches.
Cause 2 — LVGL software shadow rendering triggering watchdog: LVGL v9's
lv_draw_sw_box_shadow renders gaussian-blur shadows entirely in software. On an
ESP32-S3, even a modest shadow_width of 30px on a single widget can cause the LVGL
render task to exceed the 5-second task watchdog timeout. When this happens, LVGL never
completes rendering the first frame — the display stays white (uninitialized GRAM) and
the serial monitor shows repeated task_wdt errors with taskLVGL and a backtrace
through lv_draw_sw_box_shadow → execute_drawing → dispatch → lv_draw_dispatch_layer.
The LVGL default theme (lv_theme_default_init with dark=true) also applies shadows
to buttons and other widgets, compounding the problem. With multiple shadowed widgets,
the total render time can grow far beyond the watchdog limit.
Fix: Set shadow_width to 0 on all widgets, or use very small values (2-4px max).
Do not use lv_theme_default_init with the dark theme unless you also globally
override shadow styles. If you need a glow or shadow effect, use a colored border
instead — borders are rasterized in a single pass and are orders of magnitude cheaper.
Diagnosis: Connect the serial monitor (idf.py monitor). If you see repeating
task_wdt errors with CPU 1: taskLVGL, decode the backtrace with:
xtensa-esp32s3-elf-addr2line -fe build/voice_recorder.elf 0xADDR1 0xADDR2 ...
If lv_draw_sw_box_shadow appears in the trace, shadow rendering is the culprit.
Brown lines / pink banding / image tearing
The lv_disp_flush_ready() is being called before DMA finishes, allowing LVGL to
overwrite the draw buffer mid-transfer. Do NOT make lv_disp_flush_ready()
unconditional in the flush callback — let the on_color_trans_done IO callback
handle it.
Boot loop / USB port disappears
If firmware crashes before USB Serial JTAG initializes, the port won't enumerate. Enter download mode: hold BOOT, press RESET, release BOOT. Then flash working firmware.
Not enough memory for LVGL buffer (buf1) allocation!
The draw buffer doesn't fit in internal DMA memory. Reduce
CONFIG_BSP_DISPLAY_LVGL_BUF_HEIGHT in sdkconfig.defaults. At 50 lines the buffer
is 41KB. Also ensure sw_rotate = false — software rotation allocates a second buffer
of the same size.
SD card ESP_ERR_TIMEOUT
The SD card slot uses SDMMC in 1-bit mode. If no card is inserted,
bsp_sdcard_mount() returns a timeout error. The app handles this gracefully —
recording is disabled but the UI remains functional.
Build fails with ninja: error: build.ninja: No such file or directory
The build directory is corrupted (usually from a failed set-target or interrupted
build). Fix:
rm -rf build sdkconfig
idf.py set-target esp32s3
idf.py build
Then re-apply display patches.
CMake symlink errors / Could not create symbolic link for mbedtls
Two build processes ran concurrently or a previous build was interrupted. Fix:
rm -rf build sdkconfig
idf.py set-target esp32s3
Never run set-target while a build is in progress. Always clean first.
ESP-IDF component dependencies
Declared in main/idf_component.yml:
dependencies:
idf:
version: ">=5.3.0"
waveshare/esp32_s3_touch_amoled_2_06:
version: "^1.0.0"
The Waveshare BSP pulls in transitive dependencies:
lvgl/lvgl— UI framework (v9)espressif/esp_lvgl_port— LVGL-to-ESP-LCD bridgewaveshare/esp_lcd_sh8601— SH8601 QSPI AMOLED driverespressif/esp_lcd_touch_ft5x06— Touch driverespressif/esp_codec_dev— Audio codec abstraction (ES8311)
Lessons for related ESP32-S3 projects
-
Always check buffer DMA capability. Any buffer passed to SPI, I2S, or other DMA peripherals must be in DMA-capable memory. On ESP32-S3 with PSRAM,
malloc()may silently return PSRAM pointers. Useheap_caps_malloc(size, MALLOC_CAP_DMA)or setbuff_dma = truein framework configs. -
PSRAM is not DMA-capable on ESP32-S3. Unlike some other ESP32 variants,
SOC_PSRAM_DMA_CAPABLEis not defined for ESP32-S3. The SPI driver will attempt a bounce-buffer copy for PSRAM buffers, but this fails for large allocations. -
Don't trust BSP components blindly. The Waveshare BSP called the wrong LVGL port function for its own panel type. Verify that the display interface type (RGB parallel vs SPI/QSPI) matches the LVGL port registration function.
-
Understand the flush signaling chain. For SPI/QSPI displays,
lv_disp_flush_ready()must only be called after the DMA transfer completes. Theesp_lvgl_portlibrary handles this via theon_color_trans_doneIO callback when usinglvgl_port_add_disp()(not_rgbor_dsi). -
Keep LVGL draw buffers small. Internal DMA memory is limited (~300KB total, shared with WiFi, Bluetooth, and other drivers). A 41KB draw buffer (50 lines at 410px wide, RGB565) is a safe size. Larger buffers may fail to allocate.
-
Rounded displays need safe-area padding. Apply padding to the root LVGL screen object. Measure or estimate the corner radius and inset content accordingly.
-
USB Serial JTAG is fragile during development. If firmware crashes before USB initializes, you lose both console output and the ability to flash. Always have the BOOT mode recovery procedure ready: hold BOOT + press RESET.
-
Speaker PA must be explicitly enabled. The power amplifier on GPIO 46 defaults to off. No sound will come from the speaker without
gpio_set_level(GPIO_NUM_46, 1)before playback. -
NVS must be initialized before WiFi. Call
nvs_flash_init()before any WiFi operations. HandleESP_ERR_NVS_NO_FREE_PAGESby erasing and re-initializing. -
Avoid concurrent
idf.pycommands. Runningset-target,build, orflashin parallel corrupts the build directory (symlink conflicts, locked files). Always wait for one command to finish before starting another. -
LVGL software shadows will freeze your display.
lv_draw_sw_box_shadowis extremely expensive on ESP32-S3. A 30px shadow on a single widget can exceed the watchdog timeout, preventing LVGL from ever completing a frame render. The screen appears white because the GRAM is never written. Useshadow_width = 0or substitute colored borders for visual emphasis. The LVGL dark theme (lv_theme_default_initwithdark=true) also adds default shadows to buttons — avoid it unless you override shadow styles globally. -
Never block in LVGL event handlers waiting on a task that needs the display lock. LVGL event callbacks (click handlers, timer callbacks) run with the display mutex held. If your callback calls a blocking function like
while (flag) vTaskDelay(), and the waited-on task also callsbsp_display_lock(), you get a deadlock — the LVGL handler holds the lock and waits for the task, while the task waits for the lock. Make stop/cancel operations non-blocking: just set a flag and return. The task will see the flag on its next iteration. -
esp_codec_dev_read()returns an error code, not byte count. Unlike POSIXread(),esp_codec_dev_read()returnsESP_CODEC_DEV_OK(0) on success. The requested number of bytes is always fully read on success. Treating the return value as a byte count (e.g.,if (ret > 0)) causes an infinite spin loop because success returns 0. Useif (ret != ESP_CODEC_DEV_OK)to check for errors. -
Enable FAT long filename support for any filename > 8.3 characters. Directory names like "recordings" (10 chars) and filenames like "REC_20260426_170206.wav" (23 chars) exceed the 8.3 FAT limit. Without
CONFIG_FATFS_LFN_HEAP=yin sdkconfig.defaults,mkdir()andfopen()will silently fail. This is especially insidious when combined with a flag-ordering bug — ifs_mountedis set beforemkdir()succeeds, the app thinks storage is ready but can't create files. -
Mic and speaker codecs may share I2S resources. On boards with a single audio codec (like the ES8311), the microphone and speaker paths share the same I2S bus. Opening the mic codec while the speaker codec is still closing causes the open to fail. When transitioning from playback to recording, signal playback to stop and wait for it to fully finish before opening the mic codec.
-
Remove
LV_OBJ_FLAG_CLICKABLEandLV_OBJ_FLAG_SCROLLABLEfrom layout containers.lv_obj_create()sets both flags by default. A transparent container used only for flex layout will silently swallow touch events meant for its parent. If a list item contains a childlv_obj_createfor column layout, that child eats clicks and the item's event handler never fires.