Home

3475. DNA Pattern Recognition

Tables

erDiagram
	Samples {
		sample_id int PK
		dna_sequence varchar
		species varchar
	}
Mermaid
복사

Question

Biologists are studying basic patterns in DNA sequences. Write a solution to identify sample_id with the following patterns:
Sequences that start with ATG (a common start codon)
Sequences that end with either TAATAG, or TGA (stop codons)
Sequences containing the motif ATAT (a simple repeated pattern)
Sequences that have at least 3 consecutive G (like GGG or GGGG)
Return the result table ordered by sample_id in ascending order.

DataFlow

like ATG%
like %(TAA, TAG, TGA)
IN ATAT
IN GGG

Answer

SELECT sample_id, dna_sequence, species, CASE WHEN LEFT(dna_sequence, 3) = 'ATG' THEN 1 ELSE 0 END AS has_start, CASE WHEN RIGHT(dna_sequence, 3) IN ('TAA', 'TAG', 'TGA') THEN 1 ELSE 0 END AS has_stop, CASE WHEN INSTR(dna_sequence, 'ATAT') != 0 THEN 1 ELSE 0 END AS has_atat, CASE WHEN INSTR(dna_sequence, 'GGG') != 0 THEN 1 ELSE 0 END AS has_ggg FROM Samples ORDER BY sample_id ASC;
SQL
복사
SELECT sample_id, dna_sequence, species, (LEFT(dna_sequence, 3) = 'ATG') AS has_start, (RIGHT(dna_sequence, 3) IN ('TAA', 'TAG', 'TGA')) AS has_stop, (INSTR(dna_sequence, 'ATAT') != 0) AS has_atat, (INSTR(dna_sequence, 'GGG') != 0) AS has_ggg FROM Samples ORDER BY sample_id ASC;
SQL
복사
SELECT *, dna_sequence REGEXP '^ATG' AS has_start, dna_sequence REGEXP 'TAA$|TAG$|TGA$' AS has_stop, dna_sequence REGEXP 'ATAT' AS has_atat, dna_sequence REGEXP 'GGG' AS has_ggg FROM samples ORDER BY sample_id ASC;
SQL
복사