Compare commits

..

31 Commits

Author SHA1 Message Date
Charles Reid 9092f98bf6
Add solution to BA2F (#17) 6 years ago
Charles Reid 2a580cbdc7
add a maxint and minint utility function (#16) 6 years ago
Charles Reid 4810449c86
Fix BA2e solution (#14) 6 years ago
Charles Reid 216872f46b
Add solution to BA2E (#12) 6 years ago
Charles Reid 49d7385a4c
add materials for BA2D (in progress) (#11) 6 years ago
Charles Reid 10b2a14f54
Add BA2c solution (#10) 6 years ago
Charles Reid a0c9f211bb
add chapter3 module, and rosalind functions for BA3a thru BA3c (#9) 6 years ago
Charles Reid 87eac56bc4
Add Stronghold module (#8) 6 years ago
Charles Reid 8543defb86
Continue work on Chapter 2 problems (#7) 6 years ago
Charles Reid 2a64f89e35
Add code for Chapter 2 (#6) 6 years ago
Charles Reid 8992e3e3ce Merge branch 'chapter1cleanup' 6 years ago
Charles Reid 61b538675d Merge branch 'master' of github.com:charlesreid1/go-rosalind into chapter1cleanup 6 years ago
Charles Reid e7b887b94a shield links 6 years ago
Charles Reid 385bd4ac58 add language shield. shields up. 6 years ago
Charles Reid cf22c98e2b
clean up chapter 1, consolidate tests (#5) 6 years ago
Charles Reid ef2172d04c add MIT license 6 years ago
Charles Reid b971d2d1f9 add license shield 6 years ago
Charles Reid 257bdd3f5b fix (remove) custom travis go import path 6 years ago
Charles Reid f70724b137 add CheckIsDNA to BA1A 6 years ago
Charles Reid ef412f12f0 tests for chapter01 working 6 years ago
Charles Reid 3a2b8d65af update readme to reflect chapter 1 cleanup 6 years ago
Charles Reid d8efc0df41 add chapter01 tests to travis file 6 years ago
Charles Reid 42f5d6f5fd clean up chapter 1, consolidate tests 6 years ago
Charles Reid 6a5d827185 yikes, spaces 6 years ago
Charles Reid d3782ad84d fix tab 6 years ago
Charles Reid d4cf42824e fix import statements in readme example 6 years ago
Charles Reid 8f24f5f75c fix typo in readme example 6 years ago
Charles Reid 60abfd5288 fix typo in .travis.yml 6 years ago
Charles Reid 1695a36aa4
Make this go get friendly (#3) 6 years ago
Charles Reid 7adf5b5418
Chapter1 part2 (#2) 6 years ago
Charles Reid e65a3d6726
Add solutions for Chapter 1 (#1) 6 years ago
  1. 6
      .gitignore
  2. 14
      .travis.yml
  3. 19
      LICENSE
  4. 131
      Readme.md
  5. 73
      chapter01/Readme.md
  6. 54
      chapter01/ba1a.go
  7. 99
      chapter01/ba1a_test.go
  8. 58
      chapter01/ba1b.go
  9. 82
      chapter01/ba1b_test.go
  10. 50
      chapter01/ba1c.go
  11. 123
      chapter01/ba1c_test.go
  12. 61
      chapter01/ba1d.go
  13. 97
      chapter01/ba1d_test.go
  14. 58
      chapter01/ba1e.go
  15. 42
      chapter01/ba1e_test.go
  16. 60
      chapter01/ba1f.go
  17. 53
      chapter01/ba1f_test.go
  18. 52
      chapter01/ba1g.go
  19. 49
      chapter01/ba1g_test.go
  20. 65
      chapter01/ba1h.go
  21. 56
      chapter01/ba1h_test.go
  22. 15
      chapter01/main.go
  23. 545
      chapter01/rosalind.go
  24. 21
      chapter01/todo.md
  25. 95
      chapter01/utils.go
  26. 69
      chapter1/Readme.md
  27. 55
      chapter1/ba1a.go
  28. 59
      chapter1/ba1b.go
  29. 51
      chapter1/ba1c.go
  30. 61
      chapter1/ba1d.go
  31. 59
      chapter1/ba1e.go
  32. 61
      chapter1/ba1f.go
  33. 53
      chapter1/ba1g.go
  34. 66
      chapter1/ba1h.go
  35. 70
      chapter1/ba1i.go
  36. 71
      chapter1/ba1j.go
  37. 62
      chapter1/ba1k.go
  38. 51
      chapter1/ba1lima.go
  39. 62
      chapter1/ba1m.go
  40. 60
      chapter1/ba1n.go
  41. 20
      chapter1/chapter1_test.go
  42. 0
      chapter1/for_real/rosalind_ba1a.txt
  43. 0
      chapter1/for_real/rosalind_ba1b.txt
  44. 0
      chapter1/for_real/rosalind_ba1c.txt
  45. 0
      chapter1/for_real/rosalind_ba1d.txt
  46. 0
      chapter1/for_real/rosalind_ba1e.txt
  47. 0
      chapter1/for_real/rosalind_ba1f.txt
  48. 0
      chapter1/for_real/rosalind_ba1g.txt
  49. 0
      chapter1/for_real/rosalind_ba1h.txt
  50. 2
      chapter1/for_real/rosalind_ba1i.txt
  51. 2
      chapter1/for_real/rosalind_ba1j.txt
  52. 2
      chapter1/for_real/rosalind_ba1k.txt
  53. 1
      chapter1/for_real/rosalind_ba1l.txt
  54. 2
      chapter1/for_real/rosalind_ba1m.txt
  55. 2
      chapter1/for_real/rosalind_ba1n.txt
  56. 1
      chapter1/utils.go
  57. 69
      chapter2/Readme.md
  58. 67
      chapter2/ba2a.go
  59. 61
      chapter2/ba2b.go
  60. 54
      chapter2/ba2c.go
  61. 67
      chapter2/ba2d.go
  62. 67
      chapter2/ba2e.go
  63. 64
      chapter2/ba2f.go
  64. 65
      chapter2/ba2g.go
  65. 13
      chapter2/chapter2_test.go
  66. 11
      chapter2/for_real/rosalind_ba2a.txt
  67. 11
      chapter2/for_real/rosalind_ba2b.txt
  68. 6
      chapter2/for_real/rosalind_ba2c.txt
  69. 26
      chapter2/for_real/rosalind_ba2d.txt
  70. 26
      chapter2/for_real/rosalind_ba2e.txt
  71. 21
      chapter2/for_real/rosalind_ba2f.txt
  72. 21
      chapter2/for_real/rosalind_ba2g.txt
  73. 77
      chapter2/populate_templates.py
  74. 49
      chapter2/template.go.j2
  75. 60
      chapter3/ba3a.go
  76. 54
      chapter3/ba3b.go
  77. 54
      chapter3/ba3c.go
  78. 9
      chapter3/chapter3_test.go
  79. 2
      chapter3/for_real/rosalind_ba3a.txt
  80. 4976
      chapter3/for_real/rosalind_ba3b.txt
  81. 981
      chapter3/for_real/rosalind_ba3c.txt
  82. 49
      chapter3/populate_templates.py
  83. 49
      chapter3/template.go.j2
  84. 4
      rosalind/Readme.md
  85. 0
      rosalind/data/clump_finding.txt
  86. 0
      rosalind/data/frequent_words.txt
  87. 5
      rosalind/data/frequent_words_mismatch.txt
  88. 5
      rosalind/data/frequent_words_mismatch_complements.txt
  89. 4979
      rosalind/data/genome_path_string.txt
  90. 0
      rosalind/data/hamming_distance.txt
  91. 0
      rosalind/data/minimum_skew.txt
  92. 10
      rosalind/data/motif_enumeration.txt
  93. 2624
      rosalind/data/neighbors.txt
  94. 5
      rosalind/data/number_to_pattern.txt
  95. 19953
      rosalind/data/overlap_graph.txt
  96. 0
      rosalind/data/pattern_count.txt
  97. 0
      rosalind/data/pattern_matching.txt
  98. 4
      rosalind/data/pattern_to_number.txt
  99. 0
      rosalind/data/reverse_complement.txt
  100. 9517
      rosalind/data/string_composition.txt
  101. Some files were not shown because too many files have changed in this diff Show More

6
.gitignore vendored

@ -1,3 +1,9 @@ @@ -1,3 +1,9 @@
golibby
queens
chapter01/chapter01
# golang:
# Binaries for programs and plugins
*.exe
*.exe~

14
.travis.yml

@ -0,0 +1,14 @@ @@ -0,0 +1,14 @@
# https://docs.travis-ci.com/user/languages/go/
language: go
go:
- 1.10.x
- 1.11.x
- tip
install: true
script:
- go test -v ./rosalind/...
- go test -v ./chapter1/...
- go test -v ./chapter2/...
- go test -v ./chapter3/...

19
LICENSE

@ -0,0 +1,19 @@ @@ -0,0 +1,19 @@
Copyright 2019 Charles Reid
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

131
Readme.md

@ -1,41 +1,126 @@ @@ -1,41 +1,126 @@
# Go-Rosalind
# go-rosalind
Solving problems from Rosalind.info using Go
`rosalind` is a Go (golang) package for solving bioinformatics problems.
## Organization
[![travis](https://img.shields.io/travis/charlesreid1/go-rosalind.svg)](https://travis-ci.org/charlesreid1/go-rosalind.svg)
[![golang](https://img.shields.io/badge/language-golang-00ADD8.svg)](https://golang.org)
[![license](https://img.shields.io/github/license/charlesreid1/go-rosalind.svg)](https://github.com/charlesreid1/go-rosalind/blob/master/LICENSE)
[![godoc](https://godoc.org/github.com/charlesreid1/go-rosalind?status.svg)](http://godoc.org/github.com/charlesreid1/go-rosalind)
## Summary
Each chapter has its own directory.
This repo contains a Go (golang) library, `rosalind`, that implements
functionality for solving bioinformatics problems. This is mainly
useful for problems on Rosalind.info but is for general use as well.
Within the chapter directory, each problem has
its own driver program, which prints info about
the problem, loads the input file from Rosalind,
and prints the solution. Each problem also has
its own test suite using the examples provided
on Rosalind.info.
Rosalind problems are grouped by chapter. Each problem has its own
function and is implemented in a library called `chapter1`, `chapter2`,
etc.
For example, the function that loads the
input file for problem BA1A is in `ba1a.go`
and the code to test the functionality
of the solution to BA1A is in `ba1a_test.go`.
For example, Chapter 1 question A is implemented in package
`chapter1` as the function `BA1a( <input-file-name> )`.
This (specific) functionality wraps the (general purpose)
`rosalind` library.
## Quick Start
To run all the tests in a chapter directory:
### Rosalind
The `rosalind` library can be installed using `go get`:
```
go get https://github.com/charlesreid1/go-rosalind/rosalind
```
The library can now be imported and its functions called directly.
Here is a brief example:
```
package main
import (
"fmt"
"github.com/charlesreid1/go-rosalind/rosalind"
)
func main() {
input := "AAAATGCGCTAGTAAAAGTCACTGAAAA"
k := 4
result, _ := rosalind.MostFrequentKmers(input, k)
fmt.Println(result)
}
```
go test -v
### Problem Sets
Each set of problems is grouped into its own package. These
packages import the `rosalind` package, so it should be
available.
You can install the Chapter 1 problem set, for example, like so:
```
go get https://github.com/charlesreid1/go-rosalind/chapter1
```
This can now be imported and used in any Go program.
Try creating a `main.go` file in a temporary directory,
and run it with `go run main.go`:
```
package main
To run only a particular problem:
import (
rch1 "github.com/charlesreid1/go-rosalind/chapter1"
)
1. Edit `main.go` to call the right method
for the right problem with the right input
file name.
func main() {
filename := "rosalind_ba1a.txt"
rch1.BA1a(filename)
}
```
2. Run `main.go` using `go run`, and point Go
to all the relevant Go files:
Assuming an input file `rosalind_ba1a.txt` is available,
you should see a problem description and the output of
the problem, which can be copied and pasted into
Rosalind.info:
```
go run main.go utils.go rosalind.go <name-of-BA-file>
$ go run main.go
-----------------------------------------
Rosalind: Problem BA1a:
Most Frequest k-mers
Given an input string and a length k,
report the k-mer or k-mers that occur
most frequently.
URL: http://rosalind.info/problems/ba1a/
Computed result from input file: for_real/rosalind_ba1a.txt
39
```
## Command Line Interface
TBA
## Organization
The repo contains the following directories:
* `rosalind/` - code and functions for the Rosalind library
* `chapter1/` - solutions to chapter 1 questions (utilizes `rosalind` library)
* `chapter2/` - solutions to chapter 2 questions
* `chapter3/` - solutions to chapter 3 questions
* `stronghold/` - solutions to questions from the stronghold section of Rosalind.info
See the Readme file in each respective directory for more info.

73
chapter01/Readme.md

@ -1,73 +0,0 @@ @@ -1,73 +0,0 @@
# Chapter 1
In this chapter we perform basic operations with
strings and data structures.
## How to run
* Each problem has its own function
* To run the code for a particular problem,
call the function for that problem in `main.go`
* Edit `main.go` to call the right function,
and pass in the name of the input file you
want to use: for example, `BA1A("input.txt")`
* The function you call is implemented in the
corresponding Go file (for example, `ba1a.go`).
It loads the inputs from the input file,
calls the right function with the inputs,
and prints the results.
* The functions that load data from input files
are tested along with the functions themselves,
since each problem has a sample input file
in `data/`
## Directory Layout
* Each problem has one Go file and one test
* The `data/` directory contains input files
for the tests (i.e., files that contain both
inputs and corresponding outputs)
* The `for_real/` directory contains sample
input files from Rosalind.info for each
problem (i.e., files that contain only the
inputs)
* The `main.go` file contains the `main()`
driver function and is the entrypoint for
`go run`
* The `rosalind.go` file contains most of the
computational functionality implemented
for the problems.
* The `utils.go` file contains utilties unrelated
to bioinformatics.
## Compiling and Running
To run all tests, `go test`:
```
go test -v
```
To run a specific problem, edit `main.go`
to call the corresponding problem's function
and then `go run`:
```
go run main.go utils.go rosalind.go <name of ba1 file.go>
```
## To Do
Add a Snakefile

54
chapter01/ba1a.go

@ -1,54 +0,0 @@ @@ -1,54 +0,0 @@
package main
import (
"fmt"
"log"
)
// Rosalind: Problem BA1A: Most Frequent k-mers
// Describe the problem
func BA1ADescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1A:",
"Most Frequest k-mers",
"",
"Given an input string and a length k,",
"report the k-mer or k-mers that occur",
"most frequently.",
"",
"URL: http://rosalind.info/problems/ba1a/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem,
// print the name of the input file,
// print the output/result
func BA1A(filename string) {
BA1ADescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("readLines: %v",err)
}
// Input file contents
var input, pattern string
input = lines[0]
pattern = lines[1]
result := PatternCount(input, pattern)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(result)
}

99
chapter01/ba1a_test.go

@ -1,99 +0,0 @@ @@ -1,99 +0,0 @@
package main
import (
"fmt"
"log"
"strconv"
"testing"
)
// To run this test:
//
// $ go test -v -run TestPatternCount
// Run a single test of the PatternCount function
func TestPatternCount(t *testing.T) {
// Call the PatternCount function
input := "GCGCG"
pattern := "GCG"
result := PatternCount(input,pattern)
gold := 2
if result != gold {
err := fmt.Sprintf("Error testing PatternCount(): input = %s, pattern = %s, result = %d (should be %d)",
input, pattern, result, gold)
t.Error(err)
}
}
// Run a test matrix of the PatternCount function
func TestMatrixPatternCount(t *testing.T) {
// Construct a test matrix
var tests = []struct {
input string
pattern string
gold int
}{
{"GCGCG", "GCG", 2},
{"GAGGGGGGGAG", "AGG", 1},
{"GCACGCACGCAC", "GCAC", 3},
{"", "GC", 0},
{"GCG", "GTACTCTC", 0},
{"ACGTACGTACGT", "CG", 3},
{"AAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAATAATTACAGAGTACACAACATCCA",
"AAA", 4},
{"AGCGTGCCGAAATATGCCGCCAGACCTGCTGCGGTGGCCTCGCCGACTTCACGGATGCCAAGTGCATAGAGGAAGCGAGCAAAGGTGGTTTCTTTCGCTTTATCCAGCGCGTTAACCACGTTCTGTGCCGACTTT",
"TTT", 4},
{"GGACTTACTGACGTACG","ACT", 2},
{"ATCCGATCCCATGCCCATG","CC", 5},
{"CTGTTTTTGATCCATGATATGTTATCTCTCCGTCATCAGAAGAACAGTGACGGATCGCCCTCTCTCTTGGTCAGGCGACCGTTTGCCATAATGCCCATGCTTTCCAGCCAGCTCTCAAACTCCGGTGACTCGCGCAGGTTGAGT",
"CTC", 9},
}
for _, test := range tests {
result := PatternCount(test.input, test.pattern)
if result != test.gold {
err := fmt.Sprintf("Error testing PatternCount(): input = %s, pattern = %s, result = %d (should be %d)",
test.input, test.pattern, result, test.gold)
t.Error(err)
}
}
}
// Load a PatternCount test (input and output)
// from a file. Run the test with the input
// and verify the output matches the output
// contained in the file.
func TestPatternCountFile(t *testing.T) {
filename := "data/pattern_count.txt"
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("readLines: %v",err)
}
// lines[0]: Input
input := lines[1]
pattern := lines[2]
// lines[3]: Output
output_str := lines[4]
// Convert output to inteter
output,err := strconv.Atoi(output_str)
if err!=nil {
t.Error(err)
}
// Call the function with the given inputs
result := PatternCount(input, pattern)
// Verify answer
if result != output {
err := fmt.Sprintf("Error testing PatternCount using test case from file: results do not match:\rcomputed result = %d\nexpected output = %d",result,output)
t.Error(err)
}
}

58
chapter01/ba1b.go

@ -1,58 +0,0 @@ @@ -1,58 +0,0 @@
package main
import (
"fmt"
"log"
"strings"
"strconv"
)
// Rosalind: Problem BA1B: Most Frequent k-mers
// Describe the problem
func BA1BDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1B:",
"Most Frequest k-mers",
"",
"Given an input string and a length k,",
"report the k-mer or k-mers that occur",
"most frequently.",
"",
"URL: http://rosalind.info/problems/ba1b/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1B(filename string) {
BA1BDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
input := lines[0]
k_str := lines[1]
k,err := strconv.Atoi(k_str)
if err!=nil {
log.Fatalf("Error: string to int conversion: %v",err)
}
mfks,_ := MostFrequentKmers(input,k)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(strings.Join(mfks," "))
}

82
chapter01/ba1b_test.go

@ -1,82 +0,0 @@ @@ -1,82 +0,0 @@
package main
import (
"fmt"
"sort"
"strconv"
"strings"
"log"
"testing"
)
// Run a test of the MostFrequentKmers function
func TestMostFrequentKmers(t *testing.T) {
// Call MostFrequentKmers
input := "AAAATGCGCTAGTAAAAGTCACTGAAAA"
k := 4
result,err := MostFrequentKmers(input,k)
gold := []string{"AAAA"}
if err!=nil {
t.Error(err)
}
if !EqualStringSlices(result,gold) {
err := fmt.Sprintf("Error testing MostFrequentKmers(): input = %s, k = %d, result = %s (should be %s)",
input, k, result, gold)
t.Error(err)
}
}
// Run a test of the PatternCount function
// using inputs/outputs from a file.
func TestMostFrequentKmersFile(t *testing.T) {
filename := "data/frequent_words.txt"
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("readLines: %v",err)
}
// lines[0]: Input
dna := lines[1]
k_str := lines[2]
// lines[3]: Output
gold := strings.Split(lines[4]," ")
// Convert k to integer
k,err := strconv.Atoi(k_str)
if err!=nil {
t.Error(err)
}
// Call the function with the given inputs
result, err := MostFrequentKmers(dna,k)
// Check if function threw error
if err!=nil {
t.Error(err)
}
// Check that there _was_ a result
if len(result)==0 {
err := fmt.Sprintf("Error testing MostFrequentKmers using test case from file: length of most frequent kmers found was 0: %q",
result)
t.Error(err)
}
// Sort before comparing
sort.Strings(gold)
sort.Strings(result)
// These will only be unequal if something went wrong
if !EqualStringSlices(gold,result) {
err := fmt.Sprintf("Error testing MostFrequentKmers using test case from file: most frequent kmers mismatch.\ncomputed = %q\ngold = %q\n",
result,gold)
t.Error(err)
}
}

50
chapter01/ba1c.go

@ -1,50 +0,0 @@ @@ -1,50 +0,0 @@
package main
import (
"fmt"
"log"
)
// Rosalind: Problem BA1C: Find the Reverse Complement of a String
// Describe the problem
func BA1CDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1C:",
"Find the Reverse Complement of a String",
"",
"Given a DNA input string,",
"find the reverse complement",
"of the DNA string.",
"",
"URL: http://rosalind.info/problems/ba1c/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1C(filename string) {
BA1CDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
input := lines[0]
result,_ := ReverseComplement(input)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(result)
}

123
chapter01/ba1c_test.go

@ -1,123 +0,0 @@ @@ -1,123 +0,0 @@
package main
import (
"fmt"
"testing"
)
// Check that the DNA2Bitmasks utility
// extracts the correct bitmasks from
// a DNA input string.
func TestDNA2Bitmasks(t *testing.T) {
input := "AATCCGCT"
result, func_err := DNA2Bitmasks(input)
// Handle errors from in the DNA2Bitmasks function
if func_err != nil {
err := fmt.Sprintf("Error in function DNA2Bitmasks(): input = %s", input)
t.Error(err)
}
// Assemble gold standard answer (bitvectors)
tt := true
ff := false
gold := make(map[string][]bool)
gold["A"] = []bool{tt,tt,ff,ff,ff,ff,ff,ff}
gold["T"] = []bool{ff,ff,tt,ff,ff,ff,ff,tt}
gold["C"] = []bool{ff,ff,ff,tt,tt,ff,tt,ff}
gold["G"] = []bool{ff,ff,ff,ff,ff,tt,ff,ff}
// Verify result from DNA2Bitmasks is same as
// our gold standard
for _,cod := range "ATCG" {
cods := string(cod)
if !EqualBoolSlices(result[cods],gold[cods]) {
err := fmt.Sprintf("Error testing DNA2Bitmasks(): input = %s, codon = %s, extracted = %v, gold = %v",
input, cods, result[cods], gold[cods])
t.Error(err)
}
}
}
// Check that the Bitmasks2DNA utility
// constructs the correct DNA string
// from bitmasks.
func TestBitmasks2DNA(t *testing.T) {
// Assemble input bitmasks
tt := true
ff := false
input := make(map[string][]bool)
input["A"] = []bool{tt,tt,ff,ff,ff,ff,ff,ff}
input["T"] = []bool{ff,ff,tt,ff,ff,ff,ff,tt}
input["C"] = []bool{ff,ff,ff,tt,tt,ff,tt,ff}
input["G"] = []bool{ff,ff,ff,ff,ff,tt,ff,ff}
gold := "AATCCGCT"
result, func_err := Bitmasks2DNA(input)
// Handle errors from in the DNA2Bitmasks function
if func_err != nil {
err := fmt.Sprintf("Error in function Bitmasks2DNA(): function returned error")
t.Error(err)
}
// Verify result from DNA2Bitmasks is same as
// our gold standard
if result != gold {
err := fmt.Sprintf("Error testing Bitmasks2DNA(): result = %s, gold = %s", result, gold)
t.Error(err)
}
}
// Run a test of the function that computes
// the ReverseComplement of a DNA string.
func TestReverseComplement(t *testing.T) {
input := "AAAACCCGGT"
result,_ := ReverseComplement(input)
gold := "ACCGGGTTTT"
if result!=gold {
err := fmt.Sprintf("Error testing ReverseComplement(): input = %s, result = %s (should be %s)",
input, result, gold)
t.Error(err)
}
}
// Run a test of the ReverseComplement function
// using inputs/outputs from a file.
func TestReverseComplementFile(t *testing.T) {
filename := "data/reverse_complement.txt"
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
t.Error(err)
}
// lines[0]: Input
input := lines[1]
// lines[2]: Output
gold := lines[3]
// Call the function with the given inputs
result, err := ReverseComplement(input)
// Check that there _was_ a result
if len(result)==0 {
err := fmt.Sprintf("Error testing ReverseComplement using test case from file")
t.Error(err)
}
if result!=gold {
err := fmt.Sprintf("Error testing ReverseComplement(): input = %s, result = %s (should be %s)",
input, result, gold)
t.Error(err)
}
}

61
chapter01/ba1d.go

@ -1,61 +0,0 @@ @@ -1,61 +0,0 @@
package main
import (
"fmt"
"strconv"
"strings"
"log"
)
// Rosalind: Problem BA1D: Find all occurrences of pattern in string
// Describe the problem
func BA1DDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1D:",
"Find all occurrences of pattern in string",
"",
"Given a string input (genome) and a substring (pattern),",
"return all starting positions in the genome where the",
"pattern occurs in the genome.",
"",
"URL: http://rosalind.info/problems/ba1d/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1D(filename string) {
BA1DDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
pattern := lines[0]
genome := lines[1]
// Result is a slice of ints
locs,_ := FindOccurrences(pattern,genome)
// Convert to a slice of strings for easier printing
locs_str := make([]string,len(locs))
for i,j := range locs {
locs_str[i] = strconv.Itoa(j)
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(strings.Join(locs_str," "))
}

97
chapter01/ba1d_test.go

@ -1,97 +0,0 @@ @@ -1,97 +0,0 @@
package main
import (
"fmt"
"log"
"strings"
"strconv"
"testing"
)
func TestFindOccurrences(t *testing.T) {
// Call FindOccurrences
pattern := "ATAT"
genome := "GATATATGCATATACTT"
result,err := FindOccurrences(pattern,genome)
gold := []int{1,3,9}
if !EqualIntSlices(result,gold) || err!=nil {
err := fmt.Sprintf("Error testing FindOccurrences(): result = %q, should be %q",
result, gold)
t.Error(err)
}
}
func TestFindOccurrencesDebug(t *testing.T) {
// Construct a test matrix
var tests = []struct {
pattern string
genome string
gold []int
}{
{"ACAC", "TTTTACACTTTTTTGTGTAAAAA",
[]int{4}},
{"AAA", "AAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAATAATTACAGAGTACACAACATCCAT",
[]int{0,46,51,74}},
{"TTT", "AGCGTGCCGAAATATGCCGCCAGACCTGCTGCGGTGGCCTCGCCGACTTCACGGATGCCAAGTGCATAGAGGAAGCGAGCAAAGGTGGTTTCTTTCGCTTTATCCAGCGCGTTAACCACGTTCTGTGCCGACTTT",
[]int{88,92,98,132}},
{"ATA", "ATATATA",
[]int{0,2,4}},
}
for _, test := range tests {
result,err := FindOccurrences(test.pattern, test.genome)
if err!=nil {
t.Error(err)
}
if !EqualIntSlices(result,test.gold) {
err := fmt.Sprintf("Error testing FindOccurrences(): result = %q, should be %q",
result, test.gold)
t.Error(err)
}
}
}
func TestFindOccurrencesFiles(t *testing.T) {
filename := "data/pattern_matching.txt"
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// lines[0]: Input
pattern := lines[1]
genome := lines[2]
// lines[3]: Output
gold_str := lines[4]
gold_slice := strings.Split(gold_str," ")
gold := make([]int,len(gold_slice))
for i,g := range gold_slice {
gold[i],err = strconv.Atoi(g)
if err!=nil {
t.Error(err)
}
}
result,err := FindOccurrences(pattern,genome)
if err!=nil {
t.Error(err)
}
if !EqualIntSlices(result,gold) {
err := fmt.Sprintf("Error testing FindOccurrences():\nresult = %v\ngold = %v\n",
result, gold)
t.Error(err)
}
}

58
chapter01/ba1e.go

@ -1,58 +0,0 @@ @@ -1,58 +0,0 @@
package main
import (
"fmt"
"log"
"strings"
"strconv"
)
// Rosalind: Problem BA1E: Find patterns forming clumps in a string
// Describe the problem
func BA1EDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1E:",
"Find patterns forming clumps in a string",
"",
"A clump is characterized by integers L and t",
"if there is an interval in the genome of length L",
"in which a given pattern occurs t or more times.",
"",
"URL: http://rosalind.info/problems/ba1e/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1E(filename string) {
BA1EDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
genome := lines[0]
params_str := lines[1]
params_slice := strings.Split(params_str," ")
k,_ := strconv.Atoi(params_slice[0])
L,_ := strconv.Atoi(params_slice[1])
t,_ := strconv.Atoi(params_slice[2])
patterns,_ := FindClumps(genome,k,L,t)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(strings.Join(patterns," "))
}

42
chapter01/ba1e_test.go

@ -1,42 +0,0 @@ @@ -1,42 +0,0 @@
package main
import (
"fmt"
"testing"
)
func TestMatrixFindClumps(t *testing.T) {
var tests = []struct {
genome string
k int
L int
t int
gold []string
}{
{"CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAGGAAACATTGTAA",
5, 50, 4,
[]string{"CGACA","GAAGA"}},
{"AAAACGTCGAAAAA",
2, 4, 2,
[]string{"AA"}},
{"ACGTACGT",
1, 5, 2,
[]string{"A","C","G","T"}},
{"CCACGCGGTGTACGCTGCAAAAAGCCTTGCTGAATCAAATAAGGTTCCAGCACATCCTCAATGGTTTCACGTTCTTCGCCAATGGCTGCCGCCAGGTTATCCAGACCTACAGGTCCACCAAAGAACTTATCGATTACCGCCAGCAACAATTTGCGGTCCATATAATCGAAACCTTCAGCATCGACATTCAACATATCCAGCG",
3, 25, 3,
[]string{"AAA","CAG","CAT","CCA","GCC","TTC"}},
}
for _, test := range tests {
result,err := FindClumps(test.genome,
test.k, test.L, test.t)
if err!=nil {
t.Error(err)
}
if !EqualStringSlices(result,test.gold) {
err := fmt.Sprintf("Error testing FindClumps(): k = %d, L = %d, t = %d",test.k,test.L,test.t)
t.Error(err)
}
}
}

60
chapter01/ba1f.go

@ -1,60 +0,0 @@ @@ -1,60 +0,0 @@
package main
import (
"fmt"
"strings"
"strconv"
"log"
)
// Rosalind: Problem BA1F: Find positions in a gene that minimizing skew
// Describe the problem
func BA1FDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1F:",
"Find positions in a gene that minimize skew",
"",
"The skew of a genome is defined as the difference",
"between the number of C codons and the number of G",
"codons. Given a DNA string, this function should",
"compute the cumulative skew for each position in",
"the genome, and report the indices where the skew",
"value is minimzed.",
"",
"URL: http://rosalind.info/problems/ba1f/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1F(filename string) {
BA1FDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
genome := lines[0]
minskew,_ := MinSkewPositions(genome)
minskew_str := make([]string,len(minskew))
for i,j := range minskew {
minskew_str[i] = strconv.Itoa(j)
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(strings.Join(minskew_str," "))
}

53
chapter01/ba1f_test.go

@ -1,53 +0,0 @@ @@ -1,53 +0,0 @@
package main
import (
"fmt"
"sort"
"testing"
)
func TestMatrixMinSkewPosition(t *testing.T) {
var tests = []struct {
genome string
gold []int
}{
{"CCTATCGGTGGATTAGCATGTCCCTGTACGTTTCGCCGCGAACTAGTTCACACGGCTTGATGGCAAATGGTTTTTCCGGCGACCGTAATCGTCCACCGAG",
[]int{53, 97}},
{"TAAAGACTGCCGAGAGGCCAACACGAGTGCTAGAACGAGGGGCGTAAACGCGGGTCCGA",
[]int{11, 24}},
{"ACCG",
[]int{3}},
{"ACCC",
[]int{4}},
{"CCGGGT",
[]int{2}},
{"CCGGCCGG",
[]int{2,6}},
}
for _, test := range tests {
// Do it - find the positions that minimize skew
result,err := MinSkewPositions(test.genome)
if err!=nil {
t.Error(err)
}
// Check length of result
if len(result)!=len(test.gold) {
err := fmt.Sprintf("Error testing MinSkewPositions():\nfor genome: %s\nlength of result (%d) did not match length of gold standard (%d).\nFound: %v\nShould be: %v",
test.genome, len(result), len(test.gold),
result, test.gold)
t.Error(err)
}
// Sort before comparing
sort.Ints(result)
sort.Ints(test.gold)
if !EqualIntSlices(result,test.gold) {
err := fmt.Sprintf("Error testing MinSkewPositions():\nfor genome: %s\nfound: %v\nshould be: %v",
test.genome, result, test.gold)
t.Error(err)
}
}
}

52
chapter01/ba1g.go

@ -1,52 +0,0 @@ @@ -1,52 +0,0 @@
package main
import (
"fmt"
"log"
)
// Rosalind: Problem BA1G: Find Hamming distance between two DNA strings
// Describe the problem
func BA1GDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1G:",
"Find Hamming distance between two DNA strings",
"",
"The Hamming distance between two strings HammingDistance(p,q)",
"is the number of characters different between the two",
"strands. This program computes the Hamming distance",
"between two strings.",
"",
"URL: http://rosalind.info/problems/ba1g/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1G(filename string) {
BA1GDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
p := lines[0]
q := lines[1]
hamm,_ := HammingDistance(p,q)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(hamm)
}

49
chapter01/ba1g_test.go

@ -1,49 +0,0 @@ @@ -1,49 +0,0 @@
package main
import (
"fmt"
"testing"
)
func TestMatrixHammingDistance(t *testing.T) {
var tests = []struct {
p string
q string
dist int
}{
{"GGGCCGTTGGT",
"GGACCGTTGAC",
3 },
{"AAAA",
"TTTT",
4 },
{"ACGTACGT",
"TACGTACG",
8 },
{"ACGTACGT",
"CCCCCCCC",
6 },
{"ACGTACGT",
"TGCATGCA",
8 },
{"GATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATAC",
"AATAGCAGCTTCTCAACTGGTTACCTCGTATGAGTAAATTAGGTCATTATTGACTCAGGTCACTAACGTC",
15 },
{"AGAAACAGACCGCTATGTTCAACGATTTGTTTTATCTCGTCACCGGGATATTGCGGCCACTCATCGGTCAGTTGATTACGCAGGGCGTAAATCGCCAGAATCAGGCTG",
"AGAAACCCACCGCTAAAAACAACGATTTGCGTAGTCAGGTCACCGGGATATTGCGGCCACTAAGGCCTTGGATGATTACGCAGAACGTATTGACCCAGAATCAGGCTC",
28 },
}
for _, test := range tests {
result,err := HammingDistance(test.p, test.q)
if err!=nil {
t.Error(err)
}
if result!=test.dist {
err := fmt.Sprintf("Error testing HammingDistance(): computed dist = %d (should be %d)\np = %s\nq = %s\n",
result, test.dist,
test.p, test.q)
t.Error(err)
}
}
}

65
chapter01/ba1h.go

@ -1,65 +0,0 @@ @@ -1,65 +0,0 @@
package main
import (
"fmt"
"strconv"
"strings"
"log"
)
// Rosalind: Problem BA1H: Find approximate occurrences of pattern in string
// Describe the problem
func BA1HDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1H:",
"Find approximate occurrences of pattern in string",
"",
"Given a string Text and a string Pattern, and a maximum",
"Hamming distance d, return all locations in Text where",
"there is an approximate match with Pattern (i.e., a pattern",
"with a Hamming distance from Pattern of d or less).",
"",
"URL: http://rosalind.info/problems/ba1h/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1H(filename string) {
BA1HDescription()
// Read the contents of the input file
// into a single string
lines, err := readLines(filename)
if err != nil {
log.Fatalf("Error: readLines: %v",err)
}
// Input file contents
pattern := lines[0]
text := lines[1]
d_str := lines[2]
d,_ := strconv.Atoi(d_str)
approx,_ := FindApproximateOccurrences(pattern,text,d)
approx_str := make([]string,len(approx))
for i,j := range approx {
approx_str[i] = strconv.Itoa(j)
if err!=nil {
log.Fatalf("Error: conversion from int to string: %v",err)
}
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n",filename)
fmt.Println(strings.Join(approx_str," "))
}

56
chapter01/ba1h_test.go

@ -1,56 +0,0 @@ @@ -1,56 +0,0 @@
package main
import (
"fmt"
"testing"
)
func TestMatrixApproximateOccurrences(t *testing.T) {
var tests = []struct {
pattern string
text string
d int
gold []int
}{
{"ATTCTGGA",
"CGCCCGAATCCAGAACGCATTCCCATATTTCGGGACCACTGGCCTCCACGGTACGGACGTCAATCAAATGCCTAGCGGCTTGTGGTTTCTCCTACGCTCC",
3,
[]int{6, 7, 26, 27, 78}},
{"AAA",
"TTTTTTAAATTTTAAATTTTTT",
2,
[]int{4, 5, 6, 7, 8, 11, 12, 13, 14, 15}},
{"GAGCGCTGG",
"GAGCGCTGGGTTAACTCGCTACTTCCCGACGAGCGCTGTGGCGCAAATTGGCGATGAAACTGCAGAGAGAACTGGTCATCCAACTGAATTCTCCCCGCTATCGCATTTTGATGCGCGCCGCGTCGATT",
2,
[]int{0, 30, 66}},
{"AATCCTTTCA",
"CCAAATCCCCTCATGGCATGCATTCCCGCAGTATTTAATCCTTTCATTCTGCATATAAGTAGTGAAGGTATAGAAACCCGTTCAAGCCCGCAGCGGTAAAACCGAGAACCATGATGAATGCACGGCGATTGCGCCATAATCCAAACA",
3,
[]int{3, 36, 74, 137}},
{"CCGTCATCC",
"CCGTCATCCGTCATCCTCGCCACGTTGGCATGCATTCCGTCATCCCGTCAGGCATACTTCTGCATATAAGTACAAACATCCGTCATGTCAAAGGGAGCCCGCAGCGGTAAAACCGAGAACCATGATGAATGCACGGCGATTGC",
3,
[]int{0, 7, 36, 44, 48, 72, 79, 112}},
{"TTT",
"AAAAAA",
3,
[]int{0, 1, 2, 3}},
{"CCA",
"CCACCT",
0,
[]int{0}},
}
for _, test := range tests {
result,err := FindApproximateOccurrences(test.pattern, test.text, test.d)
if err!=nil {
t.Error(err)
}
if !EqualIntSlices(result, test.gold) {
err := fmt.Sprintf("Error testing FindApproximateOccurrences:\ncomputed = %v\ngold = %v",
result, test.gold)
t.Error(err)
}
}
}

15
chapter01/main.go

@ -1,15 +0,0 @@ @@ -1,15 +0,0 @@
package main
import (
)
func main() {
//BA1A("for_real/rosalind_ba1a.txt")
//BA1B("for_real/rosalind_ba1b.txt")
//BA1C("for_real/rosalind_ba1c.txt")
//BA1D("for_real/rosalind_ba1d.txt")
//BA1E("for_real/rosalind_ba1e.txt")
//BA1F("for_real/rosalind_ba1f.txt")
//BA1G("for_real/rosalind_ba1g.txt")
BA1H("for_real/rosalind_ba1h.txt")
}

545
chapter01/rosalind.go

@ -1,545 +0,0 @@ @@ -1,545 +0,0 @@
package main
import (
"fmt"
"sort"
"errors"
s "strings"
)
/*
rosalind.go:
This file contains core functions that
are used to solve Rosalind problems.
*/
////////////////////////////////
// BA1A
// Count occurrences of a substring pattern
// in a string input
func PatternCount(input string, pattern string) int {
// Number of substring overlaps
var overlap = len(input) - len(pattern) + 1
// If overlap < 1, we are looking
// for a pattern longer than our input
if overlap<1 {
return 0
}
// Count of occurrences
count:=0
// Loop over each substring overlap
for i:=0; i<overlap; i++ {
// Grab a slice of the full input
start:=i
end:=i+len(pattern)
var slice = input[start:end]
if slice==pattern {
count += 1
}
}
return count
}
////////////////////////////////
// BA1B
// Return the histogram of kmers of length k
// found in the given input
func KmerHistogram(input string, k int) (map[string]int,error) {
result := map[string]int{}
if len(input)<1 {
err := fmt.Sprintf("Error: input string was not DNA. Only characters ATCG are allowed, you had %s",input)
return result, errors.New(err)
}
// Number of substring overlaps
overlap := len(input) - k + 1
// If overlap < 1, we are looking
// for kmers longer than our input
if overlap<1 {
return result,nil
}
// Iterate over each position,
// extract the string,
// increment the count.
for i:=0; i<overlap; i++ {
// Get the kmer of interest
substr := input[i:i+k]
// If it doesn't exist, the value is 0
result[substr] += 1
}
return result,nil
}
// Find the most frequent kmer(s) in the kmer histogram,
// and return as a string array slice
func MostFrequentKmers(input string, k int) ([]string,error) {
max := 0
// most frequent kmers
mfks := []string{}
if k<1 {
err := fmt.Sprintf("Error: MostFrequentKmers received a kmer size that was not a natural number: k = %d",k)
return mfks, errors.New(err)
}
khist,err := KmerHistogram(input,k)
if err != nil {
err := fmt.Sprintf("Error: MostFrequentKmers failed when calling KmerHistogram()")
return mfks, errors.New(err)
}
for kmer,freq := range khist {
if freq > max {
// We have a new maximum, and a new set of kmers
max = freq
mfks = []string{kmer}
} else if freq==max {
// We have another maximum
mfks = append(mfks,kmer)
}
}
return mfks,nil
}
// Find the kmer(s) in the kmer histogram
// exceeding a count of N, and return as
// a string array slice
func MoreFrequentThanNKmers(input string, k, N int) ([]string,error) {
// more frequent than n kmers
mftnks := []string{}
if k<1 || N<1 {
err := fmt.Sprintf("Error: MoreFrequentThanNKmers received a kmer or frequency size that was not a natural number: k = %d, N = %d",k,N)
return mftnks, errors.New(err)
}
khist,err := KmerHistogram(input,k)
if err != nil {
err := fmt.Sprintf("Error: MoreFrequentThanNKmers failed when calling KmerHistogram()")
return mftnks, errors.New(err)
}
for kmer,freq := range khist {
if freq >= N {
// Add another more frequent than n
mftnks = append(mftnks,kmer)
}
}
return mftnks,nil
}
////////////////////////////////
// BA1C
// Reverse returns its argument string reversed
// rune-wise left to right.
// https://github.com/golang/example/blob/master/stringutil/reverse.go
func ReverseString(s string) string {
r := []rune(s)
for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
r[i], r[j] = r[j], r[i]
}
return string(r)
}
// Given an alleged DNA input string,
// iterate through it character by character
// to ensure that it only contains ATGC.
// Returns true if this is DNA (ATGC only),
// false otherwise.
func CheckIsDNA(input string) bool {
// Convert input to uppercase
input = s.ToUpper(input)
// If any character is not ATCG, fail
for _, c := range input {
if c!='A' && c!='T' && c!='C' && c!='G' {
return false
}
}
// If we made it here, everything's gravy!
return true
}
// Convert a DNA string into four bitmasks:
// one each for ATGC. That is, for the DNA
// string AATCCGCT, it would become:
//
// bitmask[A] = 11000000
// bitmask[T] = 00100001
// bitmask[C] = 00011010
// bitmask[G] = 00000100
func DNA2Bitmasks(input string) (map[string][]bool,error) {
// Convert input to uppercase
input = s.ToUpper(input)
// Allocate space for the map
m := make(map[string][]bool)
// Start by checking whether we have DNA
if CheckIsDNA(input)==false {
err := fmt.Sprintf("Error: input string was not DNA. Only characters ATCG are allowed, you had %s",input)
return m, errors.New(err)
}
// Important: we want to iterate over the
// DNA string ONCE and only once. That means
// we need to have the bit vectors initialized
// already, and as we step through the DNA
// string, we access the appropriate index
// of the appropriate bit vector and set
// it to true.
m["A"] = make([]bool, len(input))
m["T"] = make([]bool, len(input))
m["C"] = make([]bool, len(input))
m["G"] = make([]bool, len(input))
// To begin with, every bit vector is false.
for i,c := range input {
cs := string(c)
// Get the corresponding bit vector - O(1)
bitty := m[cs]
// Flip to true for this position - O(1)
bitty[i] = true
}
return m,nil
}
// Convert four bitmasks (one each for ATGC)
// into a DNA string.
func Bitmasks2DNA(bitmasks map[string][]bool) (string,error) {
// Verify ATGC keys are all present
_,Aok := bitmasks["A"]
_,Tok := bitmasks["T"]
_,Gok := bitmasks["G"]
_,Cok := bitmasks["C"]
if !(Aok && Tok && Gok && Cok) {
err := fmt.Sprintf("Error: input bitmask was missing one of: ATGC (Keys present? A: %t, T: %t, G: %t, C: %t",Aok,Tok,Gok,Cok)
return "", errors.New(err)
}
// Hope that all bitmasks are the same size
size := len(bitmasks["A"])
// Make a rune array that we'll turn into
// a string for our final return value
dna := make([]rune,size)
// Iterate over the bitmask, using only
// the index and not the mask value itself
for i, _ := range bitmasks["A"] {
if bitmasks["A"][i] == true {
dna[i] = 'A'
} else if bitmasks["T"][i] == true {
dna[i] = 'T'
} else if bitmasks["G"][i] == true {
dna[i] = 'G'
} else if bitmasks["C"][i] == true {
dna[i] = 'C'
}
}
return string(dna),nil
}
// Given a DNA input string, find the
// complement. The complement swaps
// Gs and Cs, and As and Ts.
func Complement(input string) (string,error) {
// Convert input to uppercase
input = s.ToUpper(input)
// Start by checking whether we have DNA
if CheckIsDNA(input)==false {
return "", errors.New(fmt.Sprintf("Error: input string was not DNA. Only characters ATCG are allowed, you had %s",input))
}
m,_ := DNA2Bitmasks(input)
// Swap As and Ts
newT := m["A"]
newA := m["T"]
m["T"] = newT
m["A"] = newA
// Swap Cs and Gs
newG := m["C"]
newC := m["G"]
m["G"] = newG
m["C"] = newC
output,_ := Bitmasks2DNA(m)
return output,nil
}
// Given a DNA input string, find the
// reverse complement. The complement
// swaps Gs and Cs, and As and Ts.
// The reverse complement reverses that.
func ReverseComplement(input string) (string,error) {
// Convert input to uppercase
input = s.ToUpper(input)
// Start by checking whether we have DNA
if CheckIsDNA(input)==false {
err := fmt.Sprintf("Error: input string was not DNA. Only characters ATCG are allowed, you had %s",input)
return "", errors.New(err)
}
comp,_ := Complement(input)
revcomp := ReverseString(comp)
return revcomp,nil
}
////////////////////////////////
// BA1D
// Given a large string (genome) and a string (pattern),
// find the zero-based indices where pattern occurs in genome.
func FindOccurrences(pattern, genome string) ([]int,error) {
locations := []int{}
slots := len(genome)-len(pattern)+1
if slots<1 {
// pattern is longer than genome
return locations,nil
}
// Loop over each character,
// saving the position if it
// is the start of pattern
for i:=0; i<slots; i++ {
start := i
end := i+len(pattern)
if genome[start:end]==pattern {
locations = append(locations,i)
}
}
return locations,nil
}
////////////////////////////////
// BA1E
// Find k-mers (patterns) of length k occuring at least
// t times over an interval of length L in a genome.
func FindClumps(genome string, k, L, t int) ([]string,error) {
// Algorithm:
// allocate a list of kmers
// for each possible position of L window,
// feed string L to KmerHistogram()
// save any kmers with frequency > t
// return master list of saved kmers
L_slots := len(genome)-L+1
// Set kmers
kmers := map[string]bool{}
// List kmers
kmers_list := []string{}
// Loop over each possible window of length L
for iL:=0; iL<L_slots; iL++ {
// Grab this portion of the genome
winstart := iL
winend := iL+L
genome_window := genome[winstart:winend]
// Get the number of kmers that occur more
// frequently than t times
new_kmers,err := MoreFrequentThanNKmers(genome_window,k,t)
if err!=nil {
return kmers_list,err
}
// Add these to the set kmers
for _,new_kmer := range new_kmers {
kmers[new_kmer] = true
}
}
for k := range kmers {
kmers_list = append(kmers_list,k)
}
sort.Strings(kmers_list)
return kmers_list,nil
}
////////////////////////////////
// BA1F
// The skew of a genome is the difference between
// the number of G and C codons that have occurred
// cumulatively in a given strand of DNA.
// This function computes the positions in the genome
// at which the cumulative skew is minimized.
func MinSkewPositions(genome string) ([]int,error) {
n := len(genome)
cumulative_skew := make([]int,n+1)
// Get C/G bitmasks
bitmasks,err := DNA2Bitmasks(genome)
if err!=nil {
return cumulative_skew,err
}
c := bitmasks["C"]
g := bitmasks["G"]
// Init
cumulative_skew[0] = 0
// Make space to keep track of the
// minima we have encountered so far
min := 999
min_skew_ix := []int{}
// At each position, compute the next skew value.
// We need two indices b/c for a genome of size N,
// the cumulative skew array index is of size N+1.
for i,ibit:=1,0; i<=n; i,ibit=i+1,ibit+1 {
var next int
// Next skew value
if c[ibit] {
// C -1
next = -1
} else if g[ibit] {
// G +1
next = 1
} else {
next = 0
}
cumulative_skew[i] = cumulative_skew[i-1] + next
if cumulative_skew[i] < min {
// New min and min_skew
min = cumulative_skew[i]
min_skew_ix = []int{i}
} else if cumulative_skew[i] == min {
// Additional min and min_skew
min_skew_ix = append(min_skew_ix,i)
}
}
return min_skew_ix,nil
}
////////////////////////////////
// BA1G
// Compute the Hamming distance between
// two strings. The Hamming distance is
// defined as the number of characters
// different between two strings.
func HammingDistance(p, q string) (int,error) {
// Technically a Hamming distance when
// one string is empty would be 0, but
// we will throw an error instead.
if len(p)==0 || len(q)==0 {
err := fmt.Sprintf("Error: HammingDistance: one or more arguments had length 0. len(p) = %d, len(q) = %d",len(p),len(q))
return -1,errors.New(err)
}
// Get longest length common to both
var m int
if len(p)>len(q) {
m = len(q)
} else {
m = len(p)
}
// Accumulate distance
dist := 0
for i:=0; i<m; i++ {
if p[i]!=q[i] {
dist += 1
}
}
return dist,nil
}
////////////////////////////////
// BA1H
// Given a large string (text) and a string (pattern),
// find the zero-based indices where we have an occurrence
// of pattern or a string with Hamming distance d or less
// from pattern.
func FindApproximateOccurrences(pattern, text string, d int) ([]int,error) {
locations := []int{}
slots := len(text)-len(pattern)+1
if slots<1 {
// pattern is longer than genome
return locations,nil
}
// Loop over each character,
// saving the position if it
// is the start of pattern
for i:=0; i<slots; i++ {
start := i
end := i+len(pattern)
poss_approx_pattern := text[start:end]
hamm,_ := HammingDistance(poss_approx_pattern,pattern)
if hamm<=d {
locations = append(locations,i)
}
}
return locations,nil
}

21
chapter01/todo.md

@ -1,21 +0,0 @@ @@ -1,21 +0,0 @@
https://github.com/moul/euler
- use snakemake
main.go is a cli:
- given a problem...
- print url for problem
- duration
- answer
- awesome go
ba1c test
- not testing everything
- finish
code coverage
- https://mlafeldt.github.io/blog/test-coverage-in-go/
- go lint
- go test

95
chapter01/utils.go

@ -1,95 +0,0 @@ @@ -1,95 +0,0 @@
package main
import (
"bufio"
"fmt"
"os"
)
// readLines reads a whole file into memory
// and returns a slice of its lines.
func readLines(path string) ([]string, error) {
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
var lines []string
scanner := bufio.NewScanner(file)
buf := make([]byte, 2)
// This is awkward.
// Scanners aren't good for big files,
// just simple stuff.
BIGNUMBER := 90000
scanner.Buffer(buf, BIGNUMBER)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
return lines, scanner.Err()
}
// writeLines writes the lines to the given file.
func writeLines(lines []string, path string) error {
file, err := os.Create(path)
if err != nil {
return err
}
defer file.Close()
w := bufio.NewWriter(file)
for _, line := range lines {
fmt.Fprintln(w, line)
}
return w.Flush()
}
// Utility function: check if two string arrays/array slices
// are equal. This is necessary because of squirrely
// behavior when comparing arrays (of type [1]string)
// and slices (of type []string).
func EqualStringSlices(a, b []string) bool {
if len(a)!=len(b) {
return false
}
for i:=0; i<len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}
// Utility function: check if two boolean arrays/array slices
// are equal. This is necessary because of squirrely
// behavior when comparing arrays (of type [1]bool)
// and slices (of type []bool).
func EqualBoolSlices(a, b []bool) bool {
if len(a)!=len(b) {
return false
}
for i:=0; i<len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}
// Utility function: check if two int arrays/array slices
// are equal.
func EqualIntSlices(a, b []int) bool {
if len(a)!=len(b) {
return false
}
for i:=0; i<len(a); i++ {
if a[i] != b[i] {
return false
}
}
return true
}

69
chapter1/Readme.md

@ -0,0 +1,69 @@ @@ -0,0 +1,69 @@
# Rosalind Chapter 1
This folder contains the `chapter1` module, which
provides functions for each of the problems from
Chapter 1 of Rosalind.info's Bionformatics Textbook
track.
## How to run
* Each problem has its own function (example: `BA1a(...)`)
* Each problem expects an input file
(example input files in `for_real` directory,
or provide the input file downloaded
from Rosalind.info)
* Pass the input file name to the function, like this:
`BA1a("rosalind_ba1a.txt")`
## Quick Start
To use the functions in this package, start by installing it:
```
go get github.com/charlesreid1/go-rosalind/chapter1
```
Once you have installed the `chapter1` package,
you can import it, then call the function for whichever
Rosalind.info problem you want to solve from Chapter 1:
```
package main
import (
rch1 "github.com/charlesreid1/go-rosalind/chapter1"
)
func main() {
rch1.BA1a("rosalind_ba1a.txt")
}
```
## Examples
See `chapter1_test.go` for examples.
## Tests
To run tests of all Chapter 1 problems, run
`go test` from this directory:
```
go test -v
```
or, from the parent directory, the root of the
go-rosalind repository:
```
go test -v ./chapter1/...
```
Note that this solves every problem in
Chapter 1 and prints the solutions (so there
is a lot of spew). It does not check the
solutions (for that, see the tests in the
`rosalind` library.)

55
chapter1/ba1a.go

@ -0,0 +1,55 @@ @@ -0,0 +1,55 @@
package rosalindchapter1
import (
"fmt"
"log"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1a: Most Frequent k-mers
// Describe the problem
func BA1aDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1a:",
"Most Frequest k-mers",
"",
"Given an input string and a length k,",
"report the k-mer or k-mers that occur",
"most frequently.",
"",
"URL: http://rosalind.info/problems/ba1a/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem,
// print the name of the input file,
// print the output/result
func BA1a(filename string) {
BA1aDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
var input, pattern string
input = lines[0]
pattern = lines[1]
result := rosa.PatternCount(input, pattern)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(result)
}

59
chapter1/ba1b.go

@ -0,0 +1,59 @@ @@ -0,0 +1,59 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1b: Most Frequent k-mers
// Describe the problem
func BA1bDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1b:",
"Most Frequest k-mers",
"",
"Given an input string and a length k,",
"report the k-mer or k-mers that occur",
"most frequently.",
"",
"URL: http://rosalind.info/problems/ba1b/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1b(filename string) {
BA1bDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
k_str := lines[1]
k, err := strconv.Atoi(k_str)
if err != nil {
log.Fatalf("Error: string to int conversion: %v", err)
}
mfks, _ := rosa.MostFrequentKmers(input, k)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(mfks, " "))
}

51
chapter1/ba1c.go

@ -0,0 +1,51 @@ @@ -0,0 +1,51 @@
package rosalindchapter1
import (
"fmt"
"log"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1c: Find the Reverse Complement of a String
// Describe the problem
func BA1cDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1c:",
"Find the Reverse Complement of a String",
"",
"Given a DNA input string,",
"find the reverse complement",
"of the DNA string.",
"",
"URL: http://rosalind.info/problems/ba1c/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1c(filename string) {
BA1cDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
result, _ := rosa.ReverseComplement(input)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(result)
}

61
chapter1/ba1d.go

@ -0,0 +1,61 @@ @@ -0,0 +1,61 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1d: Find all occurrences of pattern in string
// Describe the problem
func BA1dDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1d:",
"Find all occurrences of pattern in string",
"",
"Given a string input (genome) and a substring (pattern),",
"return all starting positions in the genome where the",
"pattern occurs in the genome.",
"",
"URL: http://rosalind.info/problems/ba1d/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1d(filename string) {
BA1dDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
pattern := lines[0]
genome := lines[1]
// Result is a slice of ints
locs, _ := rosa.FindOccurrences(pattern, genome)
// Convert to a slice of strings for easier printing
locs_str := make([]string, len(locs))
for i, j := range locs {
locs_str[i] = strconv.Itoa(j)
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(locs_str, " "))
}

59
chapter1/ba1e.go

@ -0,0 +1,59 @@ @@ -0,0 +1,59 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1e: Find patterns forming clumps in a string
// Describe the problem
func BA1eDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1e:",
"Find patterns forming clumps in a string",
"",
"A clump is characterized by integers L and t",
"if there is an interval in the genome of length L",
"in which a given pattern occurs t or more times.",
"",
"URL: http://rosalind.info/problems/ba1e/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1e(filename string) {
BA1eDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
genome := lines[0]
params_str := lines[1]
params_slice := strings.Split(params_str, " ")
k, _ := strconv.Atoi(params_slice[0])
L, _ := strconv.Atoi(params_slice[1])
t, _ := strconv.Atoi(params_slice[2])
patterns, _ := rosa.FindClumps(genome, k, L, t)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(patterns, " "))
}

61
chapter1/ba1f.go

@ -0,0 +1,61 @@ @@ -0,0 +1,61 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1f: Find positions in a gene that minimizing skew
// Describe the problem
func BA1fDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1f:",
"Find positions in a gene that minimize skew",
"",
"The skew of a genome is defined as the difference",
"between the number of C codons and the number of G",
"codons. Given a DNA string, this function should",
"compute the cumulative skew for each position in",
"the genome, and report the indices where the skew",
"value is minimzed.",
"",
"URL: http://rosalind.info/problems/ba1f/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1f(filename string) {
BA1fDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
genome := lines[0]
minskew, _ := rosa.MinSkewPositions(genome)
minskew_str := make([]string, len(minskew))
for i, j := range minskew {
minskew_str[i] = strconv.Itoa(j)
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(minskew_str, " "))
}

53
chapter1/ba1g.go

@ -0,0 +1,53 @@ @@ -0,0 +1,53 @@
package rosalindchapter1
import (
"fmt"
"log"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1g: Find Hamming distance between two DNA strings
// Describe the problem
func BA1gDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1g:",
"Find Hamming distance between two DNA strings",
"",
"The Hamming distance between two strings HammingDistance(p,q)",
"is the number of characters different between the two",
"strands. This program computes the Hamming distance",
"between two strings.",
"",
"URL: http://rosalind.info/problems/ba1g/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1g(filename string) {
BA1gDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
p := lines[0]
q := lines[1]
hamm, _ := rosa.HammingDistance(p, q)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(hamm)
}

66
chapter1/ba1h.go

@ -0,0 +1,66 @@ @@ -0,0 +1,66 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1h: Find approximate occurrences of pattern in string
// Describe the problem
func BA1hDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1h:",
"Find approximate occurrences of pattern in string",
"",
"Given a string Text and a string Pattern, and a maximum",
"Hamming distance d, return all locations in Text where",
"there is an approximate match with Pattern (i.e., a pattern",
"with a Hamming distance from Pattern of d or less).",
"",
"URL: http://rosalind.info/problems/ba1h/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1h(filename string) {
BA1hDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
pattern := lines[0]
text := lines[1]
d_str := lines[2]
d, _ := strconv.Atoi(d_str)
approx, _ := rosa.FindApproximateOccurrences(pattern, text, d)
approx_str := make([]string, len(approx))
for i, j := range approx {
approx_str[i] = strconv.Itoa(j)
if err != nil {
log.Fatalf("Error: conversion from int to string: %v", err)
}
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(approx_str, " "))
}

70
chapter1/ba1i.go

@ -0,0 +1,70 @@ @@ -0,0 +1,70 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1i: Most Frequent Words with Mismatches
// Describe the problem
func BA1iDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1i:",
"Most Frequent Words with Mismatches",
"",
"Given an input string and a maximum allowable",
"Hamming distance d, report the most frequent",
"kmer that either occurs or whose Hamming neighbors",
"occur most frequently.",
"",
"URL: http://rosalind.info/problems/ba1i/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1i(filename string) {
BA1iDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
params := strings.Split(lines[1], " ")
if len(params) < 1 {
log.Fatalf("Error splitting second line: only found 0-1 tokens")
}
k_str, d_str := params[0], params[1]
k, err := strconv.Atoi(k_str)
if err != nil {
log.Fatalf("Error: string to int conversion for parameter k: %v", err)
}
d, err := strconv.Atoi(d_str)
if err != nil {
log.Fatalf("Error: string to int conversion for parameter d: %v", err)
}
mfks_mis, _ := rosa.MostFrequentKmersMismatches(input, k, d)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(mfks_mis, " "))
}

71
chapter1/ba1j.go

@ -0,0 +1,71 @@ @@ -0,0 +1,71 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1j: Most Frequent Words with Mismatches and Reverse Complements
// Describe the problem
func BA1jDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1j:",
"Most Frequent Words with Mismatches and Reverse Complements",
"",
"Given an input string and a maximum allowable",
"Hamming distance d, report the most frequent",
"kmer that either occurs or whose Hamming neighbors",
"occur most frequently in the input string and in the",
"reverse complement of the input string.",
"",
"URL: http://rosalind.info/problems/ba1j/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1j(filename string) {
BA1jDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
params := strings.Split(lines[1], " ")
if len(params) < 1 {
log.Fatalf("Error splitting second line: only found 0-1 tokens")
}
k_str, d_str := params[0], params[1]
k, err := strconv.Atoi(k_str)
if err != nil {
log.Fatalf("Error: string to int conversion for parameter k: %v", err)
}
d, err := strconv.Atoi(d_str)
if err != nil {
log.Fatalf("Error: string to int conversion for parameter d: %v", err)
}
mfks_mis, _ := rosa.MostFrequentKmersMismatchesRevComp(input, k, d)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(mfks_mis, " "))
}

62
chapter1/ba1k.go

@ -0,0 +1,62 @@ @@ -0,0 +1,62 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1k: Generate Frequency Array
// Describe the problem
func BA1kDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1k:",
"Generate Frequency Array",
"",
"Given an integer k, generate the frequency array of",
"an input string. The frequency array is an array of",
"counts with one count per index, and integers mapped",
"to kmers.",
"",
"URL: http://rosalind.info/problems/ba1k/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1k(filename string) {
BA1kDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
k_str := lines[1]
k, err := strconv.Atoi(k_str)
if err != nil {
log.Fatalf("Error: string to int conversion for parameter k: %v", err)
}
arr, _ := rosa.FrequencyArray(input, k)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
for _, e := range arr {
fmt.Print(e, " ")
}
//fmt.Println(strings.Join(arr, " "))
}

51
chapter1/ba1lima.go

@ -0,0 +1,51 @@ @@ -0,0 +1,51 @@
package rosalindchapter1
import (
"fmt"
"log"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1L: Pattern to Number
// Describe the problem
func BA1LDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1L:",
"Pattern to Number",
"",
"Given an input kmer of length k, convert it to",
"an integer corresponding to its lexicographic",
"order among kmers of length k.",
"",
"URL: http://rosalind.info/problems/ba1l/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1L(filename string) {
BA1LDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
number, _ := rosa.PatternToNumber(input)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(number)
}

62
chapter1/ba1m.go

@ -0,0 +1,62 @@ @@ -0,0 +1,62 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1m: Pattern to Number
// Describe the problem
func BA1mDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1m:",
"Number to Pattern",
"",
"Given an integer and a kmer length k, convert",
"the integer to its corresponding kmer.",
"",
"URL: http://rosalind.info/problems/ba1m/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1m(filename string) {
BA1mDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
number_str := lines[0]
k_str := lines[1]
number, err := strconv.Atoi(number_str)
if err != nil {
log.Fatalf("Error: string to int conversion for number: %v", err)
}
k, err := strconv.Atoi(k_str)
if err != nil {
log.Fatalf("Error: string to int conversion for k: %v", err)
}
result, _ := rosa.NumberToPattern(number, k)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(result)
}

60
chapter1/ba1n.go

@ -0,0 +1,60 @@ @@ -0,0 +1,60 @@
package rosalindchapter1
import (
"fmt"
"log"
"strconv"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Rosalind: Problem BA1n: Calculating d-Neighborhood of String
// Describe the problem
func BA1nDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA1n:",
"Calculating d-Neighborhood of String",
"",
"Given an input string of DNA and a Hamming",
"distance d, compute all DNA strings that",
"are a Hamming distance of up to d away.",
"",
"URL: http://rosalind.info/problems/ba1n/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Describe the problem, and call the function
func BA1n(filename string) {
BA1nDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("Error: rosa.ReadLines: %v", err)
}
// Input file contents
input := lines[0]
d_str := lines[1]
d, err := strconv.Atoi(d_str)
if err != nil {
log.Fatalf("Error: string to int conversion for d: %v", err)
}
result, _ := rosa.VisitHammingNeighbors(input, d)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
for _, j := range result {
fmt.Println(j)
}
}

20
chapter1/chapter1_test.go

@ -0,0 +1,20 @@ @@ -0,0 +1,20 @@
package rosalindchapter1
import "testing"
func TestChapter01(t *testing.T) {
BA1a("for_real/rosalind_ba1a.txt")
BA1b("for_real/rosalind_ba1b.txt")
BA1c("for_real/rosalind_ba1c.txt")
BA1d("for_real/rosalind_ba1d.txt")
BA1e("for_real/rosalind_ba1e.txt")
BA1f("for_real/rosalind_ba1f.txt")
BA1g("for_real/rosalind_ba1g.txt")
BA1h("for_real/rosalind_ba1h.txt")
BA1i("for_real/rosalind_ba1i.txt")
BA1j("for_real/rosalind_ba1j.txt")
BA1k("for_real/rosalind_ba1k.txt")
BA1L("for_real/rosalind_ba1l.txt")
BA1m("for_real/rosalind_ba1m.txt")
BA1n("for_real/rosalind_ba1n.txt")
}

0
chapter01/for_real/rosalind_ba1a.txt → chapter1/for_real/rosalind_ba1a.txt

0
chapter01/for_real/rosalind_ba1b.txt → chapter1/for_real/rosalind_ba1b.txt

0
chapter01/for_real/rosalind_ba1c.txt → chapter1/for_real/rosalind_ba1c.txt

0
chapter01/for_real/rosalind_ba1d.txt → chapter1/for_real/rosalind_ba1d.txt

0
chapter01/for_real/rosalind_ba1e.txt → chapter1/for_real/rosalind_ba1e.txt

0
chapter01/for_real/rosalind_ba1f.txt → chapter1/for_real/rosalind_ba1f.txt

0
chapter01/for_real/rosalind_ba1g.txt → chapter1/for_real/rosalind_ba1g.txt

0
chapter01/for_real/rosalind_ba1h.txt → chapter1/for_real/rosalind_ba1h.txt

2
chapter1/for_real/rosalind_ba1i.txt

@ -0,0 +1,2 @@ @@ -0,0 +1,2 @@
CAGTGTAAGTAACGGATTGAGGACGTAACGGACTAGTATTCGAGGACAGTGTAATTGAGGACGTAACGGAGTAACGGATCGAGGACTAGTATCAGTGTAATTGAGGACGTAACGGAGTAACGGACAGTGTAACAGTGTAACTAGTATGTAACGGACAGTGTAAGTAACGGAGTAACGGAGTAACGGATCGAGGATTGAGGACCTAGTATCTAGTATTCGAGGATCGAGGATTGAGGACCTAGTATCTAGTATGTAACGGATTGAGGACTTGAGGACCTAGTATTCGAGGATCGAGGAGTAACGGACAGTGTAACAGTGTAATCGAGGATCGAGGACAGTGTAATTGAGGACTCGAGGACTAGTATTTGAGGACTCGAGGATTGAGGACGTAACGGAGTAACGGATCGAGGACTAGTATGTAACGGAGTAACGGACAGTGTAACTAGTATTTGAGGACCAGTGTAACAGTGTAACAGTGTAACAGTGTAACAGTGTAACTAGTATGTAACGGAGTAACGGATTGAGGACGTAACGGAGTAACGGATCGAGGATTGAGGACCTAGTATTTGAGGACGTAACGGATTGAGGACCTAGTATCTAGTATCAGTGTAACTAGTATGTAACGGATCGAGGATCGAGGACAGTGTAATTGAGGACTTGAGGACCAGTGTAATCGAGGATTGAGGACTTGAGGACTTGAGGACTCGAGGACAGTGTAAGTAACGGAGTAACGGATCGAGGACAGTGTAATTGAGGACCTAGTATTTGAGGACCTAGTATGTAACGGATTGAGGACCAGTGTAACTAGTATCTAGTATCTAGTATCAGTGTAATTGAGGACTCGAGGATTGAGGAC
6 2

2
chapter1/for_real/rosalind_ba1j.txt

@ -0,0 +1,2 @@ @@ -0,0 +1,2 @@
TTACTCGCTGGCAGGTTGACGGAGAAATATTGGTGACGGAGAAGACGGAGAATGGGCATATATTGGTTGGCAGGTTTGGGCATTTACTCGCGACGGAGAATTACTCGCTGGGCATTTACTCGCTGGGCATTTACTCGCTGGCAGGTTTGGCAGGTTATATTGGTATATTGGTATATTGGTTGGGCATTTACTCGCGACGGAGAATGGCAGGTTGACGGAGAAGACGGAGAAATATTGGTTTACTCGCATATTGGTGACGGAGAAATATTGGTTTACTCGCTTACTCGCTGGGCATTGGGCATTGGCAGGTTGACGGAGAAGACGGAGAATTACTCGCATATTGGTTTACTCGCGACGGAGAATTACTCGCATATTGGTGACGGAGAAGACGGAGAATTACTCGCTGGCAGGTTTGGGCATTGGGCATTTACTCGCTGGCAGGTTTGGGCATTGGCAGGTTGACGGAGAAGACGGAGAATGGCAGGTTTGGCAGGTTTGGCAGGTTTGGCAGGTTTGGGCATGACGGAGAATTACTCGCTGGCAGGTTTTACTCGCTGGCAGGTTTTACTCGCATATTGGTTGGCAGGTTTTACTCGCTTACTCGCTTACTCGCGACGGAGAAGACGGAGAAATATTGGTATATTGGTATATTGGTTGGCAGGTTTGGCAGGTTTGGCAGGTTATATTGGTTTACTCGCTTACTCGCATATTGGTTGGCAGGTTTGGGCATTGGCAGGTTTGGCAGGTTGACGGAGAATGGCAGGTTGACGGAGAAGACGGAGAATGGGCATTGGGCATGACGGAGAATGGCAGGTT
5 3

2
chapter1/for_real/rosalind_ba1k.txt

@ -0,0 +1,2 @@ @@ -0,0 +1,2 @@
CAATGAGTGATATTGTTTGGTAGCAATCCATAGTTGAGGCCCTACGGAAGTTGCATCCGGGGCCCGTAGGACTCGCGGGCAAAAGATTGCTAAGCATTCTTGGTCACCATCGCAGTATTGCTCGTAGTCGGGTGGGTTTGCCGAACTGATAATGTGCCAGTCCCCGCGGAACCGGAATCAGGGCAACGGCTAGAGATACTCTCCGTGGGTCCTAAGTAGGAGGCTTGGGGCTGAGTGAGCAACCACTTACTCGAGTGTGTTGTTTTCTGTGCGTCCCCCGGGCGGTGTTCATTTAAGGATGACCGGGTGAGTAACCGAACAATTTTGTTGCCATGAAACGCGGCAATAACTCAATCTACCAGTACGGACAAATATAATGTTGGGCCCTTTTAGCTTAACGGACGTCGTCCCATTCTGACCTTAACTAAGACTATAAGGTAGGGGGTCAGATACGACACGGTCAGTAGGTGGATATACCGTGACAAATACCGGCACCTATGCTAATTGCGATTTGGAATGGAACGCGCCGAATACTTCGGATCATATCACCGTCCCTGTACTCGAAAGTTCTGCCACGAACAAGTCTCCTACTTGTGTCTTTTCTCACTGCGAAG
5

1
chapter1/for_real/rosalind_ba1l.txt

@ -0,0 +1 @@ @@ -0,0 +1 @@
TGCCGTATTGACGAACACCGAGCCCTAAT

2
chapter1/for_real/rosalind_ba1m.txt

@ -0,0 +1,2 @@ @@ -0,0 +1,2 @@
6003
9

2
chapter1/for_real/rosalind_ba1n.txt

@ -0,0 +1,2 @@ @@ -0,0 +1,2 @@
TGCCCTAG
3

1
chapter1/utils.go

@ -0,0 +1 @@ @@ -0,0 +1 @@
package rosalindchapter1

69
chapter2/Readme.md

@ -0,0 +1,69 @@ @@ -0,0 +1,69 @@
# Rosalind Chapter 2
This folder contains the `chapter2` module, which
provides functions for each of the problems from
Chapter 2 of Rosalind.info's Bionformatics Textbook
track.
## How to run
* Each problem has its own function (example: `BA2a(...)`)
* Each problem expects an input file
(example input files in `for_real` directory,
or provide the input file downloaded
from Rosalind.info)
* Pass the input file name to the function, like this:
`BA2a("rosalind_ba2a.txt")`
## Quick Start
To use the functions in this package, start by installing it:
```
go get github.com/charlesreid1/go-rosalind/chapter2
```
Once you have installed the `chapter2` package,
you can import it, then call the function for whichever
Rosalind.info problem you want to solve from Chapter 2:
```
package main
import (
rch1 "github.com/charlesreid1/go-rosalind/chapter2"
)
func main() {
rch1.BA2a("rosalind_ba2a.txt")
}
```
## Examples
See `chapter2_test.go` for examples.
## Tests
To run tests of all Chapter 2 problems, run
`go test` from this directory:
```
go test -v
```
or, from the parent directory, the root of the
go-rosalind repository:
```
go test -v ./chapter2/...
```
Note that this solves every problem in
Chapter 2 and prints the solutions (so there
is a lot of spew). It does not check the
solutions (for that, see the tests in the
`rosalind` library.)

67
chapter2/ba2a.go

@ -0,0 +1,67 @@ @@ -0,0 +1,67 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2a: Implement Motif Enumeration
func BA2aDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2a:",
"Implement Motif Enumeration",
"",
"Given a collection of strings of DNA, find all motifs (kmers of length k and Hamming distance d from all DNA strings).",
"",
"URL: http://rosalind.info/problems/ba2a/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2a(filename string) {
BA2aDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("ReadLines: %v", err)
}
// Input file contents
params := strings.Split(lines[0], " ")
k, _ := strconv.Atoi(params[0])
d, _ := strconv.Atoi(params[1])
// 1 line in the input file is for
// parameters/gold standard.
// The rest of the lines are DNA strings.
// Make space for DNA strings
dna := make([]string, len(lines)-1)
iLstart := 1
iLend := len(lines)
// Two counters:
// one for the line index (iL),
// one for the array index (iA).
for iA, iL := 0, iLstart; iL < iLend; iA, iL = iA+1, iL+1 {
dna[iA] = lines[iL]
}
results, _ := rosa.FindMotifs(dna, k, d)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(results, " "))
}

61
chapter2/ba2b.go

@ -0,0 +1,61 @@ @@ -0,0 +1,61 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2b: Find a Median String
func BA2bDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2b:",
"Find a Median String",
"",
"Given a kmer length k and a set of strings of DNA, find the kmer(s) that minimize the L1 norm of the distance from it to all other DNA strings.",
"",
"URL: http://rosalind.info/problems/ba2b/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2b(filename string) {
BA2bDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
k_str := lines[0]
k, _ := strconv.Atoi(k_str)
// Make space for DNA strings
dna := make([]string, len(lines)-1)
iLstart := 1
iLend := len(lines)
// Two counters:
// one for the line index (iL),
// one for the array index (iA).
for iA, iL := 0, iLstart; iL < iLend; iA, iL = iA+1, iL+1 {
dna[iA] = lines[iL]
}
results, _ := rosa.MedianString(dna, k)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(results)
}

54
chapter2/ba2c.go

@ -0,0 +1,54 @@ @@ -0,0 +1,54 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2c: Find a Profile-most Probable k-mer in a String
func BA2cDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2c:",
"Find a Profile-most Probable k-mer in a String",
"",
"Given a profile matrix, find the most probable k-mer to generate the given DNA string.",
"",
"URL: http://rosalind.info/problems/ba2c/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2c(filename string) {
BA2cDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
dna := lines[0]
k_str := lines[1]
k, _ := strconv.Atoi(k_str)
// To make multidimensional slice,
// make a slice, then loop and make more slices
profile, _ := rosa.ReadMatrix32(lines[2:6], k)
// Find the most probable kmer
result, _ := rosa.ProfileMostProbableKmers(dna, k, profile)
fmt.Println(strings.Join(result, " "))
}

67
chapter2/ba2d.go

@ -0,0 +1,67 @@ @@ -0,0 +1,67 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2d: Implement GreedyMotifSearch
func BA2dDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2d:",
"Implement GreedyMotifSearch",
"",
"Find a collection of motif strings using a greedy motif search. Return first-occurring profile-most probable kmer.",
"",
"URL: http://rosalind.info/problems/ba2d/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2d(filename string) {
BA2dDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
//// Input file contents
params := strings.Split(lines[0], " ")
k, _ := strconv.Atoi(params[0])
t, _ := strconv.Atoi(params[1])
// 1 line in the input file is for
// parameters.
// The rest of the lines are DNA strings.
// Make space for DNA strings
dna := make([]string, len(lines)-1)
iLstart := 1
iLend := len(lines)
// Two counters:
// one for the line index (iL),
// one for the array index (iA).
for iA, iL := 0, iLstart; iL < iLend; iA, iL = iA+1, iL+1 {
dna[iA] = lines[iL]
}
result, _ := rosa.GreedyMotifSearchNoPseudocounts(dna, k, t)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(result, " "))
}

67
chapter2/ba2e.go

@ -0,0 +1,67 @@ @@ -0,0 +1,67 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2e: Implement GreedyMotifSearch with Pseudocounts
func BA2eDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2e:",
"Implement GreedyMotifSearch with Pseudocounts",
"",
"Re-implement problem BA2d (greedy motif search) using pseudocounts, which avoid setting probabilities to an absolute value of zero.",
"",
"URL: http://rosalind.info/problems/ba2e/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2e(filename string) {
BA2eDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
params := strings.Split(lines[0], " ")
k, _ := strconv.Atoi(params[0])
t, _ := strconv.Atoi(params[1])
// 1 line in the input file is for
// parameters.
// The rest of the lines are DNA strings.
// Make space for DNA strings
dna := make([]string, len(lines)-1)
iLstart := 1
iLend := len(lines)
// Two counters:
// one for the line index (iL),
// one for the array index (iA).
for iA, iL := 0, iLstart; iL < iLend; iA, iL = iA+1, iL+1 {
dna[iA] = lines[iL]
}
result, _ := rosa.GreedyMotifSearchPseudocounts(dna, k, t)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(result, " "))
}

64
chapter2/ba2f.go

@ -0,0 +1,64 @@ @@ -0,0 +1,64 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2f: Implement RandomizedMotifSearch with Pseudocounts
func BA2fDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2f:",
"Implement RandomizedMotifSearch with Pseudocounts",
"",
"Re-implement problem BA2e (greedy motif search with pseudocounts) but use a random, instead of greedy, algorithm to pick motif kmers from each DNA string.",
"",
"URL: http://rosalind.info/problems/ba2f/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2f(filename string) {
BA2fDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
params := strings.Split(lines[0], " ")
k, _ := strconv.Atoi(params[0])
t, _ := strconv.Atoi(params[1])
// Make space for DNA strings
dna := make([]string, len(lines)-1)
iLstart := 1
iLend := len(lines)
// Two counters:
// one for the line index (iL),
// one for the array index (iA).
for iA, iL := 0, iLstart; iL < iLend; iA, iL = iA+1, iL+1 {
dna[iA] = lines[iL]
}
n := 100
result, _ := rosa.ManyRandomMotifSearches(dna, k, t, n)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(result, "\n"))
}

65
chapter2/ba2g.go

@ -0,0 +1,65 @@ @@ -0,0 +1,65 @@
package rosalindchapter2
import (
"fmt"
"log"
"strconv"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA2g: Implement GibbsSampler
func BA2gDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA2g:",
"Implement GibbsSampler",
"",
"Generate probabilities of each kmer in a DNA string using its profile. Use these to assemble a list of probabilities. GibbsSampler uses this random number generator to generate a random k-mer.",
"",
"URL: http://rosalind.info/problems/ba2g/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA2g(filename string) {
BA2gDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
params := strings.Split(lines[0], " ")
k, _ := strconv.Atoi(params[0])
t, _ := strconv.Atoi(params[1])
// Make space for DNA strings
dna := make([]string, len(lines)-1)
iLstart := 1
iLend := len(lines)
// Two counters:
// one for the line index (iL),
// one for the array index (iA).
for iA, iL := 0, iLstart; iL < iLend; iA, iL = iA+1, iL+1 {
dna[iA] = lines[iL]
}
n := 100
n_starts := 20
result, _ := rosa.ManyGibbsSamplers(dna, k, t, n, n_starts)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(strings.Join(result, "\n"))
}

13
chapter2/chapter2_test.go

@ -0,0 +1,13 @@ @@ -0,0 +1,13 @@
package rosalindchapter2
import "testing"
func TestChapter02(t *testing.T) {
//BA2a("for_real/rosalind_ba2a.txt")
//BA2b("for_real/rosalind_ba2b.txt")
//BA2c("for_real/rosalind_ba2c.txt")
//BA2d("for_real/rosalind_ba2d.txt")
//BA2e("for_real/rosalind_ba2e.txt")
//BA2f("for_real/rosalind_ba2f.txt")
BA2g("for_real/rosalind_ba2g.txt")
}

11
chapter2/for_real/rosalind_ba2a.txt

@ -0,0 +1,11 @@ @@ -0,0 +1,11 @@
5 1
GATTTGGGCCAAAGTCTGCGGCGAA
GATGTGCGTCAACCAGTCGGAGTCC
TCACACCGGCTCGGAGATTTTTTTT
GATCTACAACGCGTGACTATATGCT
TAAGTGATTTTGTGGCCTTTACTCG
CCATCTACCCGATGTTCGACCGCGT
GAGCGCGCTGCCTACATTTGGATCT
TCCGGGTTAGGATGTTGAAACAAAA
ATGGAGCCATGATATGTACACTTAG
GCATGGATCTTACTCCGACGTTATC

11
chapter2/for_real/rosalind_ba2b.txt

@ -0,0 +1,11 @@ @@ -0,0 +1,11 @@
6
CCCTAGTCTACCTGTTTGGAGCGGGGCCTGAATTTGACTGGC
GTCTTTACCGAGTTAGTCTGATGTAAGTACTGCTCCTCTACC
CCGACATTGCGCTCTACTCTGCGCACATAACTAAACGTTGCA
CCTCCGTCTACATAGAAGGAGTCTGCAACGCCCCCACTGAGG
ATCTTGCTCGTATCTACCGATAAGTAGCGAAAATCTAGCGTT
CGGGGTTACCTGGCAGTGTCTACTAGATCAGATTGCCCGGCT
TTAGTAAATGAATCTACGTCTCTGAGCGCGCGAATCAGGGTG
TGAGCACTCTGACTTAACTCTACTACTCTCCAATAAGCGCTC
TCACGTTCTACACTAGGTAAGTATGCATATTTGCATGAGTCT
TTTGAAGAAGGCTCTACAAATTTAAACCCAGACTCAGACACG

6
chapter2/for_real/rosalind_ba2c.txt

@ -0,0 +1,6 @@ @@ -0,0 +1,6 @@
GTCACAGCTGCATAACAAGTAAACTGAGAAATCCCCAGTTAGGCGGATTGACCATCGAACACACTTTCACTACTTGCGGATAAATCCTGTAGAACTAGACTTTATCTCGGCTGCGACAAGACAGGAGTTCATGCACCTGCTCTGTCCCTCGCAACAGTCTAGGGAGCAAGTAGGCGGCTTCTTAGCTAGTACCTGGGTAG
7
0.393 0.286 0.286 0.25 0.179 0.321 0.107
0.071 0.357 0.25 0.286 0.214 0.393 0.357
0.214 0.214 0.143 0.286 0.25 0.143 0.25
0.321 0.143 0.321 0.179 0.357 0.143 0.286

26
chapter2/for_real/rosalind_ba2d.txt

@ -0,0 +1,26 @@ @@ -0,0 +1,26 @@
12 25
AGATCCGGTTTTATTCAAGCGAATTAGTGGGAGTGCGAGCATGCGCCAGATTCGTCCGGGATTGTCGTTAGGACACTAAACAGAGTCAGGTGCAGTGAGGAACCGGTCCTCCTTGCTGTCCATCTTTGGCTATCAATCGCTTTGCGGGCGGCATGC
CGAGCATCCCTTTAACATAATTGCCCGTGGGTGTATTGCGTTTTTCCAACGCATAAGAGCATCTTATGTGTTTATGCGTGGAAGCCTATCACTTTGCATAGCGTTTGGCGATCACCTCCATGCCGCAAGGCCTAAGGCACACGGTTAATTGGGTCA
AACGAGGCGAACCCTGGAACAGGTACCATGCCTTTGCGATTCAGCTTCTATCCCCGTCTAATTAGACATCTCAGCGTTCCTCAAGCTAGCAGACTGCACAGGGCTTATCCCCGGATGGTCGCTACTTCTCTGTGCATATAGCACGTAATGCCACAT
CTTCCCGTCGAAATGCTACATAGACTGAGCGATACATGCGGTGCAGTTAGTTTGTTGACCTTATCCCACACTACAACGGCCTGTTACATTGCGCGTGTCTTATGCAAATCGATCGCTTTGTAACCGTAATCCACCATTTCTGGAAAGCATTTCCAG
ATTAAACATTCCAGCAACACGCGGGCGATCCTGAGGAATCACCGCAACTCACGTCTAGAGCCTGTCCGGCACTCGATTACTTTGTCTTTCGAACCCCGTTGGTTAGTGCACTCTGTCATATAGTGCTAGGCTGCCCTCTCAGACGCGCTCAGTCGT
TCGGTGTGTACACCTGGTAGAGGAGGAACCAATTAAACTTCGTGAACCCAAGGCGGCCCCCCATTCAGTTCGACTGGGACTCCGGCGCTTTTATGCGCGCGTAGAGGCAGTGACAAGGCTTCCGGTTAAGTCTTCTTTACTGACGCCATGCCTTTG
CGTATCTCTGTTTAGGCTCCCACCCCGATACCTTTGTTTCTCATATGAGCGCTTGTCTCGCCGCCAGATATCTGACTGGTCCGGTGATCAATGCTTAGGCGTTCAGGTTTACTACTGTCGCGACAAGACGGTCATACGCGCCAAAGGCTTCACAGC
AAGCGAAGTCCTTTGAATACTAAAACTCACCACTGGGCCGTCCCGACTATAAGTTGTCGCGAACAGAGTTTCTGTTACTTACCTCACTATCTTGCATCCATTCCTTTGGGTATTTGGGTTGTACACGCTATACGATCATGATTAGTCTCTATGCCT
TAACAACGATGCGGTTCCGTAATCGTAGTGAGAAAACCGGGTAGGAAGTAAGTGTGCATGAACGTTAGGCGCGTCTTGAAGCCAGATGGGTAGCTGGCTAATGTTTCTGCCATAGGACTGGATCACTTGTGCCCAACAGGAACAGCAATTCCTTTG
GCGATGACTTTGACGGCAGATCCGACCTCGGCTTAGTATGGTGGATGAACCTCCAAGTCACCGGGTCCTAGCATTATTTCGAATGGCCGAGGAGGCCATCATTAGGTAACGCCCAGAGTACATCCCCCGAACACCGAAGGTCGTTCGCGTCCGGCA
CCTAACGTACCATTTTTGACTGGAAGCCAAAGTTGACCGGCTTTTATAGCTTTTGACGGTCTCCTGTACTCAAGTAGATTTTTGTTAACAAACCTGGCATTGTCGTCATACAGTCAGGGAAGATACTTCCCTAGCTGCACCCACCCAATAGCTTTG
TGCTCTGACCAGACGATGGCTTTGCTGGAGGTTGAAGGCCATTTTTTTGTTCTAGTGCCCGACAGCTTCATGAGGGCGGTCGACTCTGAGGCTTGAGCAAAACCTAATATAAATGCTGAAGCTTAGCGCACGGCACGGAAATTGGGGGGAACTACT
CGGAAGCGTTTATGACGGCGACAGGAGTAACCATGAAGAGGAACAGGCGCGACGATGGAACCGCCTTACTACGTTCCGTCACGCCACCCGAGTGGAGTCGGTACCGTTAAGCTGACGGCGCGCTATTCTCTCCTGATTAGGTTACCTATGCCTTTG
GATGTAGCCATATAAATCATTCATCGTTATTGTGGGCTCTTGTCTACCGTATACACACACCCAATCCCTTTGGGCATTATTCGACTATCCCCTACCTCGCCTACTGCTGATACCACGTTTTAGGCTCCGTTTCATATATATCCCCCTAAACAAGGG
GATGGAGCGTTGGCGAACCGCTGAGCGAGCTATGAACAGCCTGTGAGACGCGGGGTAGGAGCCATCACTTTGGATCGTTCCCAGTCTTTCTATTATCAGTATCGATATGCGGCAACCAGTTTTCTTGCGCTCTGAACCATCCTATAGTAGAACTTC
TCCTATACGTAGCCTCGTCCGGCCTGACGTGTCCGGATTCATTTAGAGGCCATTACTTTGCTGTCAGTCGCTGCACTCATGTCGATTGTCGTGGTTGATTTAAAGACCCGCATAGCACAGTACCCTAACCCCAACTTCTCTCTGTTTAGACAGTGC
GAGCTTTGTATGGAGATTGCGCTTCCGATTGCTTTGAACATCGGACGCGCTTATAGAGACACTCGTGCTGGCAGACCGGTGCGCGATAAACGAATCTCGGCGTGCATTGGTGTTTGGGCTTCCGATGTCAAAGACCGCAGAACTGCGCCGGGGAAT
CGATCTTCAAAGGCTGGCTTGCATTAGGAGGACTGTGAAGAACACGCTTCTCTTATGACTGCACGGCGGTTGACTACGTCGCTTTGGGGCCACCCTTTCATTGCATGAACAATACCTTTGGTCTTTGACTGATCTTGAGGAGTCCACCGGATCACT
ACATTTCAAACACACTGTATGGGTTACCCTAATTCGCTGCGCATGCGCTGGGCCTCGAGCGAAGAATGTACGTGCTTTAGCTACTGTCAGTCTATCCAACGAAACTACGGCTTACGTGGTTACAGACCCCATGCTGGTTGGGAATCGATTTCTTTG
TATAAAGAAGTAGGTCCGTCAGATTCGAGGAATCCTCGATGTCCCTGGTACATGCAAAAGTTCAGAGCCGTAGAACTACTGTAGGCGATTGCTTTGCGCAAAAGGGATCAGTCGCCGTCGTAACTCAAATTTAGTCTTTTCACCAACGTGCAGGGA
TTTGAGTCATTATTAACGGTGTACGGAGTGACGCCCCCAATGCCTTTGTCCGGCTTGTACCGGATTATCCGCTTGAGTAACTTATTCTTATCTGAGATGTCGGTGGATATTGCCACTTAATCGAAACGATCGTACCTCGCCCGAGTCCTAGCAGCG
CGCACGTGAATGTAGGAGCCAATCCGGCCTCTTTAGTGCTCCAATCACTAAGGGTAGATTTGTCGCACCACCCGTATGTGATCCCTCAAAGCGAAATCATCTACACTCTCCATAGCTTTGAAATCCAATAGTACAACCTCGGCCGGGTAATCACCA
ACCATATCTTTGCGGACTTCCGAAAAGATCGAAAAAATAGCTTACTGACCCCCAACCTTGAGGTAAGAGCGGTCCCTCGGTCAGGCGGAACTTCCAGTGTCCGATTAGATCAGGCCGCATAGTGTGGGACTCCGATCAAGTGTATAATATGCAGGT
GCGGGGGGAGTTTGCTAGGACAGTCGGGCGGTAGTTTGTGTCTTAAGTAACTGCTCGAAGGCTAGAATGTGGGATCATAGCTTCAGCGGATTCCTAGCGATGGCTTTGAAACATGGACGAGTTACTTTTGGCGTTTTTGAGAGTTTATAAGGTGAG
CCAAACATGGTGGTCACTATTAATTGTCCTCCGCGTACCGAGATACGAGGGGAGTCCTCCCACAATTCGTCGCCGATTTCTTTGAGTCAGGGTATCATAGGGAGTGCTATTCCATAGCGATAACTGCTCCACAGAAGTTCATTAAGTATTTTTTCT

26
chapter2/for_real/rosalind_ba2e.txt

@ -0,0 +1,26 @@ @@ -0,0 +1,26 @@
12 25
GATGTGCGTCACAATCCCGCCCTCCAGCTGAACTAGCCAATACTTCCTCTTTCTGCTTCATGATTCACCCAATAGACACTAGGGCTTATACGGGGTGTGTACTTCCCACTGTGGGGCGAGCTGAGTCCATAGTCATGGGCCCGCCTCATCTAAACC
TCATAGCCGCCGCGCTGGTGGGTGCAAGTCTGCCGACTCCCACCTTATATGTAGCAGGTCACGTAGATAGGAGTGTGTTATATCCTCCGATAATCCCTAAATATAGGATGATACTGGTTCTGCCAGACTCTGTCTGTCTTGAAGTCGCTAGATGAA
GCGTATAGAAGGAATCCGACTTGGACGCCATTCAGATCAACTAACATAAACAGCAGTAAGCACTTAGCTAGTTGATACCGACGCATACAGAGCGGTCGCGAAAGTCGAGACCGTTCCTTGGTTCAGCAATTCGTGGCTGCCGTCCTCTGGTAAGCC
ATTGTATGGAACGGTGATGTGCTACTCGCTTAAACCTATAAGCGCACATTTGCAGTCCATGTGATACCTTCCAAGATGTTATTGGCGTGGAAATTAATAAGACGAGACTATTCGTCCTCTGGGTTAAGCTGGTATTTAGAAAGATTATCCAGCAAT
CGGGCCGCAATGGGCGCCATTACGTTTTTGGTTTAATCATTGCTGGATCTCCCAACCCACTCGCCTGCGGGACGTTCGCTCAATCTCCGCCCACGTCTGTGCTCTATCTCAACGATAGTCCTCGACTATAGCCTCATGTAAACCCGGGAGACCTTT
ACACAGCGCCTCGCAGGGGGTGTGAGGACCGACGTTGTAAAGGCCATACCGACTCTCAAGGCATTTTGGGAACCCCCACCTGTCTGTGCCATTGAACTCTACTAACCCTAATTCACCTATTGCCGAGCCTTCGGATTTGACGAATACGCATTGCGG
AGCTAGCCGCTTGCCAAACCGCTCTCAGAAGGAACGAGCCTAATTCTCCTCACGTAATCCGGCCAGTTCATAGGTGTGCAAAAACTTAACACACGTTCGGGTGGGGTTAGCAGCGCAGAATTGAACCCCTTCACGCCGACAGGGTGGCTAACTACA
ATTCATGATTGCTGAGTCATTGAAACACACCAGAAACGTCAGCGGCTAGACTATGTACCAGCGGAAGCTGGGATTCCTTTAGAAGCTGTTCCCATACTTGGTGGGTGATCCTATGATACTCTCGGTTAATCCGTGCGAATTTAGCTGTCCCTGAGT
TGGTTGTGATATTAACCAAGTGCGCTCCCCTAAACCTCGAACCTGGGCCATTTAGATGACTTAACGCCGCCTAGCAGCGCCGCCGTGGTCTTACAACTAGATAGAACTGCGAGGCATTCTGCGGGGCGTAAGTGTCGTCACAGCGATCATGGATTC
AACTTTCATGTCCACGCGCAACCTTCGGTTTCGTCCCTCTGCTAAACCAATTGGTCATTTCTATTGCTGGACCCACCCGCAGGCGGTCGAAAAGAAATACGCTGCCCCGAATACAGCTCCGATTCCTCTTGCCTCCCAGACACAGATGGTAACTTA
GGTCCGCGTTGTAGTACACGTAAAGCCTTATCGGGGGTCCTAAATCTAGTACCTGGCAGACTCACATAAACCGCTCTCCCCGCCTGGAATCTAATGTCTGTAACTGATCCTTTTCCTTCATATCAATACCTCTGTATCAGTGTCCAAAGGTCAGCT
AGATGAGGTGCAGTGGGGGGAGTGATGCTAAGTGGTCCCACTGAACTCATTGAACCCATACTCGGTGGGACTAACATCTACAGCCGCACGTCAGAACTACAAACAAGATCGGCACGAACGTTCGCGACTATTTACATCTTAGCGCTCTACTAATCC
CGCGTTGTTGCCCCCGGATTACTACTCATAGTCTGGACAACTGAGACGATAGCTCATCATCGCGGTAACATGCGTGATTACGGGGAAAATTATGGGGTGACCCAGATCCGCTTCGACGCGCTCACCTAAGCCAACACGCGGGTATCGGTCTAGTAT
TCTGACGCTTTCCCCACGAGAAGATCATCCCGGTTGTTACGATCCTTCCTCCAATAAACCTGCGCCCCATACTGAGCGGTACCCTTTCAAAGGTAGAGTGCTACCATGCGATGTATCTGAATACCATAAACGTGTGAGTAGGATTGTGGGGGTACA
GACCGAAAATCGATGGCCAGACCCATCGAAGTCCGCCCCAAGTTGGCTATGAGTTTAATGTGCGCGTTGCTGCTATGAGGATGTCAGGCTGGGCCGAGGGAATGAGAAAGTTTCTGACCTCGAATGAGAGTGAGTCCTCCCGCACTCCGCTAAGCC
TTAAGGTCTCTGACCGTTGTAAAAATTGTGCCCCCCCATCTTGTAATCGCTAACGAAAGGAAATCTTCTTGACAACCACAACGAGTAACTGCCGACTCGGTGGGTAGCCAGCGGTTGACGAACTGGACAATGCTCTAGTAAACCCCGGTCGACAGC
TAAGAAACCTTGTCGACTGGGAAACTCGCGTAAACCTTTGCGGAGGCTCCTTATCGTTACCTGACTTCCAGAGGGACTACCGTCGAGTCAAGGGGTCAGTTAGGTAAGCACACGATGCATGCAACCCTTGCTGACTTCTTCAATTTCCGTGACCGT
GACCATCAAGCCCGTAAATCGATGGTATCCTATGGTATCGTCACCACAGCCTCGTCACTTAAACCAGGCGTAAACCGAGTGAAACGACTCGGCTCGCAGATAGAACCCGTGTAGAAAACCATGTGAAACAAATGAATCACTTTACTCGGGTAAGCC
TCAGTGTTAGCCTCGGGTCAGAACGGTCTATTGAAGTTAATGACACTCGGGTAGTCGGCGCTCAGCTAATCCGCCGCTAAGCCACGAGCAGGCGACAAAGGAGAATCTGGCGACGGGTAGCACGATAGTTGGGGAGGCCTCGCCTAGGTTAGACAG
CTCGACTAATCCGAATTACACTGCACCTCCGAGTGGGTCTGGTTACTGCCGGTAGGGAGGCCAAATAGCTTGCCCAACTCGATAAGTCCTATGACGATCGGGCTGTCATAACAAGATTAATAGGGATGTCAAACCGTAGGGTGCACAGACAATTCC
ATCCAATTCTCGATGGGATTACACCCTAACCAGACTGGGGATCCATAAGTCTTTTTGCCTCTCGCGTAATCCGCGCATGACTGTATACACTTCCCCAACGGGGGGTCAGTGTTTTCTATTCCCGCAGCACGCCCCGTCTAGCCGACCCGAAGGTGC
GATGTCAGATTACACCATTTCTGGCGCGTTTTGAAAGATCGGACCTTCATATGGGTTCCTGCTCAGCGTGGACCAATGAGAATGGAGAGCCATGAATTAGCACTACGACTCTCCTAAACCATGATTCTGATTTTCTGATCTTCCCATCAGCCGTAC
GCTAATCCTCCTGAGTTACGAGCATCCATGCGATAAACAAGACCGATTCACATCCAAATTGGCCGTCTCTGTGATGCTGGGCCGGTGAAGTTGACTTCGTAGAGTTTATCTCCAGCCTGCAACCTGAAGGATCTCGACTAACCCTTAAGCGAGCTG
CCGCTCGAAGATCCCCTCTTGCAGCACGGTGCAGGTTCGGCAGGCTGAAGTCTACACCGCTTTGGTGACAAAGCGAATGACTCACTTAGGCCCGCGCATAGGGCAACCGTACATCACCGACAGAGTGTACAGCTCGGCTAAACCAGACATACGCTT
GGGGAGCGGTGTCGAAAGAGAAGGCATCTCTGAAGGAGTTAAAAACCACGATTTGAAAGTCCTCTGTATATGCTCGCGTAAGCCTTGCTTTTCCCACTGAGGCTACACAGGCGAGTCCAGCTAATGACGGCGTTCTCATCTCAATGTTGGCGACTG

21
chapter2/for_real/rosalind_ba2f.txt

@ -0,0 +1,21 @@ @@ -0,0 +1,21 @@
15 20
CCCCGAGTAATTCCCAGATATAACGTACTCAAATGTTGAAAACAAGTGACCACTGTATCCACGAGGGGTGTAACTCTTATCAATGGCATAAGGGCCACGAGGACTCTACCATAAGAGCACAGCGCCAGCTGACGAATGAGTATCTCTGACCAACCGATTGCGATCTGTTGTTGGCAGATAGGCCCGCAGGACCCCGAGTAATTCCC
AGATATAACGTACTCAAATGTTGAAAACAAGTGACCACTGTATCCACGAGGGGTGTAACTCTTATCAATGGCATAAGGGCCACGAGGACTCTACCATAAGAGCACAGCGCCAGCTGACGAATGAGAGATTTGACTACTAGTATCTCTGACCAACCGATTGCGATCTGTTGTTGGCAGATAGGCCCGCAGGACCCCGAGTAATTCCC
ATAGTGCGTACACCACAAGTGAGATATGATACTTCGACCCAGAGGTAAAGATAAGATCTAGTATTAACCCCGGAGCGAAGGGAGAATGGTACGATCTTGAACAGACTACTCATCGCCGATATGAGTCGAAGATAATGCTGTCATCAAAAGTGGCTTTGTTGAGGTTAACACTGTAGACTGGATGCAAGGCCGATGAATTATAAGTC
CGGGCTCGGAGAACAGACTAGGGGTACGAAAAGGTTCCGAAATTAGCACGCGCGCGTATAAAAAGATCGACGACCATGCCCGAGTTAGCTCACAGGAACAACTTTGGATAGTTAGATCCCAGCTGACAGTTCGAACTACGCAATCAGGGCTCCTCTGGATTCATACTCTAAGCATGAGAAGGCACAGAGCAAACAGCTACTTGGAT
CTTCAGTTAATGATGCCTCAGAGGTCGGCGTTGAACCGCGTAACAGACTACTATCTTTATGCGCAGTACAGTTGTAATATGACTAAGGCGCCCGCGAACCGTTCCAACGTGCCGAGAAGGTTGGCCTACAAGGAAGAAGCCGGTCATTCAGTCTTCAAGGCCAACGGTCCTGCACAGATGATTACGCACCGATCAGTATAATGTGT
CCATTGGGTGAGTTGATTCCATGATTCGTAGAAGCCACTACTAGGTGAGCTAGGCTCCTCTACAGTATAGAGAAGAGCCTTTAAGCCTATCCTGGAGCCTCTCACCCCACAATCGTAAGAACTTGGGTGCGTGAATGACTGAAGTATACATCACCTTAACTCATATTGTTGATCCGCTGTTGTCTGATTGGTAGGCTTGGTAGCGC
CGAGCGCTTTTTGCACACCGAACGTGTCAGTTCCACATGAGCGTGACAGAGTGCCCGCGCATGGGGTAATCCCGTATCAGAACAGTAAACTAGGTCATGTCCTCCATCGTCTTAGAAGGGGACAACCCCGCAGGGTATGCTAAGAAGTGGAGTAGAGAGTGTTGTGCTGAACACGCGTATCCGGCGGTTTCAAAGTCCAAGGTTGA
TGTCGTCCCTCTTCTTTTCACTCACATGTATGCCGCTAATACAGACCAACTAAAGAAGAACCAGCTACTAGTGCCATACCTCAAAAGCTAGAACTGTAATAACACACCGCTCGTTGTGGGCCGATTGTATATTAGTAAAGCAGCCAATATTTGTAACACGGCCGATGACCGTGCAAGTTTCCCTTGGAGGCAATGGCATCAAATTC
TGAGGTGATTGTTTATCCAGATTGGCTGTTTGTCTGAGAAGCTTGTAGCGAGATTCGACTACTAGCTATAGCGACTCAAAATGCTGCGCATTCCCAACTAAAGTAAAACGCAAGCTTAGATGCAGAATTGAGATCACTTGTCTGGATTCATTTTTAATGTGGCGCTACAGGTGCATGTCATGCCCGGTAGGTTGAGAGGCTCTCAA
ACGTGGAGCGATCCTACCGGTGTATGTGTACCATCCTGGCTGAAAGGCAGTCGACTGGCCACCGTTCGGGCGGCTGCGTTAGAGCTGACTACTCGCATAGGTCTAGAGCGATGACCCCTTGTTTGTCAGCGTATATCCTGGGTAATCGTTGTACCGCCTTTCAGGCCGATCCTAGGCAAGACTACTAGAACTAGGAGAACATTGCG
TAGACCGCCACCCAGGGTGCTTCCTTAGATCAATCCCGCGTGTAATTAAGCGTAGGGGAGACGCTCCTGAGAACAGACTCTCAGCTAAGCGTAGGAGGAGCATTTTTTTTGGATGAGGCCTCCTCTGTGGATACAAACGGCTCGGTCAACCAGCCACACGAATCAAGTGGAGGATATGTTGTAGTCCGCTCATTCCGAACTTTTAC
GATATTTACGCAGTCGGACACCCCTCTTCCTTAATGCTTCGTTAAAATATCGGTCCGTGCGAGTTCTGGTGGCCCGTAGGCCTCGCTACTTAGAGGTGGGCCCCCAAGGGCACGCATAATCGGCGCCTACCGCGGTAGATCAAAGTGAACGAAGATCTTGTAAAGATTCTATGCGGAGGCAGAACATCTTACTAGGTAAGAGATTC
GCTAAAGCTATCCAAGGATACCGGGCGGTGCTAAACTATTACAGTATAACATCAATCAATCACCGCTTCGCCTTCCTCGTAAGTCATACTGCATATGGGCTTCCACTATAGAGTAATCCCGTGAGTGGACGAGAAAGCACTACTAGACAACACGGACACCGTTACAGTCATAGTTGCGGCAGTGCTCATAAGTCCTTCACACAGGG
TCCCGCTATATCTGATCTTTGTGACTGCTGGGGTATAACTCAGCTGGCACAGAACAGACCCGTAGTCCAGAAAGACTTGAGTATAATGACGCCATGCTCGACCGGAGGTCAAGTACGGGAGAGGCTTACGCAGAATCCCCCAAGAGGCTCGTTAACTGACCGCAATCGATTTCAATTCCCTCCGAGCTCACCGAGCGCTGGTATAG
TTCCTGACTGCTTGAGCCAGCGCTATATTGCGCGCAAACCAGTCGCGTGACGTACCGTCATACCGAGTGATGTAAAGTTGATTATTGGGATCAGCTAATTCCTCGCGGTGTTAGTTCATCACTTTTGAGTCCGACAGACTACTAGTTCACTACTTCGAAGTTTGCTCTTACAAGCGAGGAACGGCCTGCCGAGTATACAGGGGCGT
CAGGCCTCGCGATTAACCTTATGCGGCCTCTCAGAGCCCCGTCTAGCGAAGGTAATATGAGAACAGACTACAGAGACTACGCTCTGCCCCCTAAGGACTGGGAATTTATGGCCCTATATCGCCCTTATTCGCCATAACTTCTAAGATTTGCTATTACCATTCTGAGGCAGTTTAACTAAATGGCTATCCAACCTATGAGGACTGAA
GATCCAGAACAACTTACTAGTCCGTAGGGCTGGACCTCTTCATGCCCGGGCGTCGGGCACATACGCGTATCAAAATGGGAGATGGCATTATCTACTTCTCCTGTGATTTTGAACGTAGTCACCCCACATCATACTTTTCACACCTCGTACTGGTGATTCCATTCTACCCAACGATACGTAATTGACCCCGCTTTTGATTGGAATTT
CAAGAGTCGTACGAGCCCTCCGTCATCAATGCTTGCGATTAGAGTTCACGGTAGAATCCACCAGAGCAGAAGAGAACGCCAGTAGCGACCAGAAAGCCTTTAGAAAAGGCAGACTACTAGAATGTTGTGTGTAAGTGTACCAAACATTGATTCGGACGGTTGTCGTGTTCGAACCAGGTGATTTGGTGAGGTTTCAGCGCCTAGTG
TTGTGATGATCTTCGTAAACATCCGGTGGGAGCCCCCCTCTCCTCGATGACTGTTTGACTATGATCCATTTACCTTGACTCGCAGGACGACAAGCCATTATTCATGCCGTGTGATAAGAACGCTCTACTAGGTTCTGTCCATGCCCAGCACAGATCAAGGGACCCGGCGGGCCCGGGTCAGAACTTTGGTCACCATTCGCAAATCA
AGAGGCACCTGGCGCCAAAGGCATTTAATGAACATGGCGAACTGCCAGACGAGCATGGTAGAACAGAAGCCTAGGACCATCCCGACATAACAACCACTATTTATAATTGAACTATCTTGGCACACACGCTATTGGCGTTGCACTGAGACCGTTCATCGCCTTCACTGTGACCATTCGCCTATAGACATATAACTAACTTGGCTTCA

21
chapter2/for_real/rosalind_ba2g.txt

@ -0,0 +1,21 @@ @@ -0,0 +1,21 @@
15 20 2000
AGTGGTTTAATCGGACGAGGCGTGTCCCTCAGCCCGATAACCATCCCGTCCTGTGTGCGACCGTTGAGCATCGTATTAGTTCCGTAGGATTTTGCGGTCGTCTATTTGATATAAAGTCAGGTATATATGGCCACAAGTTCGCGTGGACCGTTAGCGCACCAACACTGTAATATAACTGCCTTAAGGTAGCGACTCGCCAAGCGCAGGGGCAGCCCTGACAGTTTCCACGAAACTCAAGAGAGTATGTAGCGACAGTCCTTCGCAAGACAATCGTACGTGTCTACCGAAACTTAATTTCGTTAGTGGTTTAATCGGA
CGAGGCGTGTCCCTCAGCCCGATAACCATCCCGTCCTGTGTGCGACCGTTGAGCATCGTATTAGTTCCGTAGGATTTTGCGGTCGTCTATTTGATATAAAGTCAGGTATATATGGCCACAAGTTCGCGTGGACCGTTAGCGCACCAACACTGTAATATAACTGCCTTAAGGTAGCGACTCGCCAAGCGCAGGGGCATCGAAGAGTGTGCATGCCCTGACAGTTTCCACGAAACTCAAGAGAGTATGTAGCGACAGTCCTTCGCAAGACAATCGTACGTGTCTACCGAAACTTAATTTCGTTAGTGGTTTAATCGGA
TCGCTAAGTGGTGATACCGGCTGATAAGAAAGTAAGATTTCAGCATGACCCTGTTGATTCCACCCCTTCCTTTCATGGTGAGGCTTGTCTTTGCGGCGCCTCACGGTACCTGTGGACTGTACACACGAAGCACAACTTCCGAACTATTCGTTTGTAGACTATAAATCACCATGCTCATCAAGCTCAAAATTTCTCCTTACACCGACCGCGGTGGGAAAAAACGCAACGAAGCTCCAATTATCTCCAGTCTCTGCACGTGTAGAGATGGTGGAAAGCTAAGAGATGCCTTCGCCACATTAAGTCCCGCACAACGTTA
ATTGGCAAAACCGATAGGATCCCGCGACTATGACGTCGCTTTTCGCTAAGTGTGGGCTGACCCTCCTACAAATAAGTCTCGTTTTAACCCTGGCCATTGCTTACAACCGCCGAAGTCGCGCTTCAATCAAAGGTGCAGGGTTATAATAGACATACAATTAGGATGTTTGACCGACATGCCTTGTTAACTTTAATTGACGTTACAGATTGATTATGCGATCTCTTTATGTTCTCAATTTAATATACCTCCGCTGGTTCCTATTGGGAGCCTTCAACACATAATAAATCCTTGTACCTCTGATTGAGTCTCTTTGCCT
ATGTTCCTTAAGTAATTAATAGTACGTACACCGGTATTCGCTAGCCGTGCATCTTGACCCCCCCAAGGCGAACAGTTTGGATTTGCGAAGTCCCACGAAGGGGGCTTAAGGCTTGAGCCACATCCAGTTATGAAAGTATATCATCGGCACCCAGGAGGCTAGACAGGAGGTCAGAAAATTCCGCATTAGCGTCGTTGCGCAAGGCCGTCGCCGCCCGTGCTTCCAGGATTAGATCGCCTGCCAGACAGTCCGACTCCGTTGACAATAGAAGACAAGCTTATGCCCCGATTCACTCACCACCCAGACAGGCCCGCAT
TGCTCGGACTGATATCCGCGTATGCGTACGTAATGTCTAGCAGGCGGTCGAGCCATAGGCTTCAATAGGGGTGTTGCGACTAAGCGATTGGCACAGGAAGCATTGGAATTAACACCGCAGTCATCTAAGTGTGCATACGGGCATGTGGAGATTTTTCTACGAATGATGCGTCAACGACCCATGGAATGAGTTTTTAGTTGTTACCCATTTTTATAATAACGTGCGCGGTTTATCTTATCCCTTATAATGATCTCTAACATAGGCGTACCTGAAAAGAATGCATTAAGCCGCAAAGGAGCCCAATTCTCAGCCGTCG
AATCTAAGTTACTTCATGGTTCACGGTGCCACATCGACTGATCATCCATGCCTAGTGCTGTACTTAACCCATCATATTTCCTAAGTGCTTCGAACCCTTCGATCGGGGTGGTCATCTGTCCGTGACAAGGCTGCTAATAACCCACGCCGGTATAACACTGATTACGTTATACGCCTTATTCGGCAGTGACTGGCGTGCCACGTGCCAGGTAAAAAGAAATCTGGAACAGGGCTCCTCTTTCATAAGTGTGCATTTAAATAAGGGGAATAGTAAGGTCTCATTTGTAGTGCACGTGCCTTTCAATTATAGGCCCATA
CGCCATCCACGTTTTAGTAATGTACCCAGGCCAACTAACACATAGCAACCGTCAGTTTTCACAGTTGTCATCTTGCCGCCCGAATAAGCCCCGCTGACCAACGTCTGAGGACGTTCTCCGCGGAGATGAGGGTATAGCGTCGTCGTACCTGCATTACCGAACAACTCTCCATCTTTAGGGAATTACCCATCTTAGCTATAGGACACAAGAGTCGACAGTAATTGTGGACTGGCTTTTGCGGTCTCGGTTCAATCAAGGAAAACCCTCTTGCACTACAATCGCAGCGTGTGCATTCAGAGCCCTTATCATACCTCGA
AGGCGAGGTGAGGTCATGACCTGTCTAACCCCTTAGCGCCGGTGTAAATTCAATGCACGTAACGCTAGAGGCCTTAGGCCTCGATTCATCCTTGTGATCCATGTAATCGAACGACGCCTATCTAGTTCCGGAGCTTCGAACGAGGCCGATTAAAAACCCGTTGGCCGGTTGATCTGTGCTGCTCAAGATGTAACATCCCGAACTCAGCGCATCACGCCCCGCCAGAGCCTTTGGGAGCAGGGCCAGCGCTCGCTGGTTGTGCATGCGCTGCAATTCAAACACCTGTGGCACGAGTGCCCCAAAGGACCATCATCGA
TTGCAATACAGTGCCCCTTGTGCTGTTTGCTAGGCGAATACAGGCGACCGACACAAAGCCCGGCCCTATATCAGTACGAGGCAACATACTGCTCGCTAAACTAGCGTATAAATTTGGACACCATAATGCGCAATGCACGCGGTATAGGTGGTCTTGTGGTAAGAGGGATTTCTAAATATCGTATTCCCCAACGCAGGTATACGAAGCCCATGGTAATTTAAGCGTTTAAACAGCTAGAACTCGCTCGCTCTTTGTGCATGTACTAGGTCCTTGTGTAGAAGAGGCGCAATGCCTTGCTATAGACCTTTGTCCCGTC
GTGCGAAAATGCATAATTATAACTTTTCGCCTCGGGCGCGTCCACGGTATTACGAAGTCGAAGCGCGCATCCACTGACAATTCACAGACAGCAAAAATTGTTGCATTTAATCACGGGGACCCTATGGTGCAGTTGCGAGACCACATATGACCGGCTTTGTAGCACCGGCCCCAGTTTAATTCCCCTGACTAATGGTAGATACGACGACCGCCCCCCACAGACTCACTCACCTCCAGCACCAGCTCAACGGTGACCCCTTTCTGTTCTAATGCGGTATTCGCTAAGTGTCACTAAGGTTAGTGCGGCTTTTGCTGCA
GTAGTCCGTGGCATATGGAAGGGGAGCTTTACTCCCTGATCGGTGAGTGTGGAGACGTTTAGCGTACTGGTCACGGCAAGAGACGTTGTGAGTGTTGTATATGTTTCTTAGGAAAGCGGACGGCGTTACGCATGAGTAAGACGGCCTAAAGAATGAACCATGATTGATAATCTATTAATTGTTAGGTAAGGATAAGCAAAAAGGTGCTGCTGGGTCTTCAAGTAAGGGATATTCCCTCGCAGTGTGTGCATGACTCATATGGTTAACCCGTCTAAGCAAAAATCTCGACAGGAAGGTGAGTCGGCGCCATATGAGG
GGCGACAGCTGGTAACACGGCTATCGGGGCCGAATTGCCGACACTTGGCGCATCGCTGGAAGTGCTTCAGATAGTTATGACGGTGAAACACGCTCCGGCACAGCCTATAGTATGTTCGTAGCGTACAAAAGCTAGGCAGCCGTAATATCAGCGCTTAATGCTTTAATTGGCATGTGTCCTATTTGTGACGTTGATGTCGATGATAATCCGCACAGAACAACTCATGCATATTCGACGAGTGTGCATGATAGACACTGCGTTGTTGCCATGTATCTCCTGAAGCACGCACACAACGAAGGTCGCGTGCTTTTTCCGG
ATACTACGATCAAGCGAAAAGGGGACAATCGTTGGCAGGGTCACTAGGGCAGGGTCTTAGAGAATCAGTGCAACGTTTTAATTCGCTCACCTGTGCCGCTAAGTGTGCGATGTGTAATCTCCAATGGGAAATGAATCCTTCGGCTCGAGTAATGATTGCGTATGGTATTGGCCCAACGTAGGACTCAAGTCCCTGGTCGTAGCGCGATCTGTAATGTAAACTTACCATACGAGCAGGCTACGTTGAGGAGCGCTCGCGGAACGGTATAGAAAACGGTACGTCATATTGGCCCTTGTGACTCCTCCTCCGCTTGGAT
TTTTATTACATCTGTTCGCTAATCCTGCATGGTAAACAGTTACAGGAATTGCGATACTCCACTGGGCCACCACTACTTCACTAACTGGTAAAATGCCGGTCAGCTCATAGGTCCAAATAACGTTTATGGGTGTTAGCAATGTTAGCGATGCGATCTGTAAGATCCGAATCGTATCACAGCGCGCTCCTGCCAACGTCACCTGCTCCACCTAGCACTTGTCGATAACTCCCCCTCCAATCTACAAACACCAACTCGAAAATTGACACGCGCTTCTCGGCTGTTGGTGCACTCTGATGTTAATATGATGCCATGAGCC
TGCCAGATACCTACCCTCTATATTCCAAACGTGAGTGAACTAAGTTCGATTACGAACCCGTACGTGGCTGAGCACGGCTAGTACCGGCCCGATTGTACCGTTCATATTGATTAGAGTGACCGGAGCACACAATCATCGCCCCCTTAAGCTATAACTGTCCCCGGAGCCTGAGCCTTTGACAACTTCGATAGGTTAATAAAAGAGTCTCGCTAAGTAGCCATATGATGAGTGATAGGGAGGCCTAGACCTGGACAACCCCTCATTTACGTCCGAACTCGGAGTGAAGTGTAATGTGAGCTCTTAAAAGGAGCTGGAT
AGGCTCTCTGATTTAGCGGTGACCGCGTCCCGATTCTCACTCCTCAGAAGGTCTGGTAGCGCTACGGGGATGGGAACCCAGACACTCGAATCGAATCGGTATGACACTAGTACAAGGGGGCCTTTAACGCACGAGAATAACACAATTCCTTCGCATACATAGCTAACAGCAGACATAGGTCTTGATAAAAACTGTGCTGCTTCCTCAGAGTCGCTAAGAAAGCATGCCAGTGCACACTGGACATTACGCCGCAGTACAGCAATTCGCGTCTCAGATCAACCTGGGGAAATAAAACGTCTTTGCGTTAGCCCTTTGT
TGCCTTGATGTCCAGGCATAGGTCATGTACGGGCTACCGTCATGTCCATGTCAGAATCCGAATGATCCTCTGGATTCCGATCCCGGCAGAGGGTAACTGTGCGACCTCAGCTTCCTCATCCCGCTATCGCTCACGGCCGGTCCTAGTGCGGCATGGATAAGCTCATCCAGGATGATTTACCCAAACCCTTTCACGTGGTGGTGGGGCGCGACTGTCACGCAGGAGAGTCGCCCGAGCTGTGGGCAGGAATACTTTCCTAAAGTGTGCATGTTAGGAAGAGACGTTAACTGCGCCTCCCTATCCTATCTGAGTGGCG
ATAACGACCTGTGTGTTCATCGTATCTTCTCGAACACTTATGTAGATTCGCGTGGCTACGTTGTACATTCACTCCACTCAAGAGCGAAGGGTGACGTTTTCACTCCTCGCTGGAAAACCTAGAACGGGCTGTTTTTTACGATCAAAACAAAACCACTTGATAATTGTACTATTGTCTGGTAAGCTAAGTGTGCAATCAAGATCAACTCAATCCCCTGCCACCATAGTGTGGGCACCACGTAGAGAATTCGTCGAACAGATAACGCAAACTGACAGGGAGCTTAATGAACCATCAGCCGATCACCTTCGTGAGCATC
TCGCTACTGGTGCATCCTATCTATTGATATTGACAACCCGGGATTAGTGACAACCGATTTCAGACTAAACTAGTTAGTAAAGCATTTCTCTATCTCCGCCGAGTGGACGGTGATCTAAGCAAGTAGGTGCCAGGAGGCCCATAACCGCCAATGACTTTCATGATCTAATCGACGGTTCGTTTTGAGGTTGGGGTACGCTCATAACCTTTATGTTTTGGTACACGCCTGTCACCTGCGCCGTGGTATCTGAGACATTTGTCTCCTGGACTAGTTGATTCCAGTATTCACAGAACGCCGGGATACGTTTCCGTCAATA

77
chapter2/populate_templates.py

@ -0,0 +1,77 @@ @@ -0,0 +1,77 @@
import jinja2
import os
def main():
# Jinja env
env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'))
problems = [
{
'chapter': '2',
'problem': 'a',
'title': 'Implement Motif Enumeration',
'description': 'Given a collection of strings of DNA, find all motifs (kmers of length k and Hamming distance d from all DNA strings).',
'url': 'http://rosalind.info/problems/ba2a/'
},
{
'chapter': '2',
'problem': 'b',
'title': 'Find a Median String',
'description': 'Given a kmer length k and a set of strings of DNA, find the kmer(s) that minimize the L1 norm of the distance from it to all other DNA strings.',
'url': 'http://rosalind.info/problems/ba2b/'
},
{
'chapter': '2',
'problem': 'c',
'title': 'Find a Profile-most Probable k-mer in a String',
'description': 'Given a profile matrix, find the most probable k-mer to generate the given DNA string.',
'url': 'http://rosalind.info/problems/ba2c/'
},
{
'chapter': '2',
'problem': 'd',
'title': 'Implement GreedyMotifSearch',
'description': 'Find a collection of motif strings using a greedy motif search. Return first-occurring profile-most probable kmer.',
'url': 'http://rosalind.info/problems/ba2d/'
},
{
'chapter': '2',
'problem': 'e',
'title': 'Implement GreedyMotifSearch with Pseudocounts',
'description': 'Re-implement problem BA2d (greedy motif search) using pseudocounts, which avoid setting probabilities to an absolute value of zero.',
'url': 'http://rosalind.info/problems/ba2e/'
},
{
'chapter': '2',
'problem': 'f',
'title': 'Implement RandomizedMotifSearch with Pseudocounts',
'description': 'Re-implement problem BA2e (greedy motif search with pseudocounts) but use a random, instead of greedy, algorithm to pick motif kmers from each DNA string.',
'url': 'http://rosalind.info/problems/ba2f/'
},
{
'chapter': '2',
'problem': 'g',
'title': 'Implement GibbsSampler',
'description': 'Generate probabilities of each kmer in a DNA string using its profile. Use these to assemble a list of probabilities. GibbsSampler uses this random number generator to generate a random k-mer.',
'url': 'http://rosalind.info/problems/ba2g/'
},
]
print("Writing problem boilerplate code")
t = 'template.go.j2'
for problem in problems:
contents = env.get_template(t).render(**problem)
fname = 'ba'+problem['chapter']+problem['problem']+'.go'
if not os.path.exists(fname):
print("Writing to file %s..."%(fname))
with open(fname,'w') as f:
f.write(contents)
else:
print("File %s already exists, skipping..."%(fname))
print("Done")
if __name__=="__main__":
main()

49
chapter2/template.go.j2

@ -0,0 +1,49 @@ @@ -0,0 +1,49 @@
package rosalindchapter{{chapter}}
import (
"fmt"
"log"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA{{chapter}}{{problem}}: {{title}}
func BA{{chapter}}{{problem}}Description() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA{{chapter}}{{problem}}:",
"{{title}}",
"",
"{{description}}",
"",
"URL: {{url}}",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA{{chapter}}{{problem}}(filename string) {
BA{{chapter}}{{problem}}Description()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
//// Input file contents
//input := lines[0]
//params := lines[1]
//result := rosa.PatternCount(input, pattern)
//
//fmt.Println("")
//fmt.Printf("Computed result from input file: %s\n", filename)
//fmt.Println(result)
}

60
chapter3/ba3a.go

@ -0,0 +1,60 @@ @@ -0,0 +1,60 @@
package rosalindchapter3
import (
"fmt"
"log"
"strconv"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA3a: Generate k-mer Composition of a String
func BA3aDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA3a:",
"Generate k-mer Composition of a String",
"",
"Given an input string, generate a list of all kmers that are in the input string.",
"",
"URL: http://rosalind.info/problems/ba3a/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA3a(filename string) {
BA3aDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Input file contents
k_str := lines[0]
k, err := strconv.Atoi(k_str)
if err != nil {
msg := fmt.Sprintf("Error: string to int conversion failed for %s\n",
k_str)
log.Fatalf(msg)
}
input := lines[1]
result, _ := rosa.KmerComposition(input, k)
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
for _, kmer := range result {
fmt.Printf("%s\n", kmer)
}
fmt.Printf("\n")
}

54
chapter3/ba3b.go

@ -0,0 +1,54 @@ @@ -0,0 +1,54 @@
package rosalindchapter3
import (
"fmt"
"log"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA3b: Reconstruct string from genome path
func BA3bDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA3b:",
"Reconstruct string from genome path",
"",
"Reconstruct a string from its genome path, i.e., sequential fragments of overlapping DNA.",
"",
"URL: http://rosalind.info/problems/ba3b/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA3b(filename string) {
BA3bDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Trim each line and there are your contigs
for i, line := range lines {
lines[i] = strings.Trim(line, " ")
}
genome, err := rosa.ReconstructGenomeFromPath(lines)
if err != nil {
log.Fatalf("Error when calling ReconstructGenomeFromPath()")
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(genome)
}

54
chapter3/ba3c.go

@ -0,0 +1,54 @@ @@ -0,0 +1,54 @@
package rosalindchapter3
import (
"fmt"
"log"
"strings"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA3c: Construct the overlap graph of a set of k-mers
func BA3cDescription() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA3c:",
"Construct the overlap graph of a set of k-mers",
"",
"Given a set of overlapping k-mers, construct the overlap graph and print a sorted adjacency matrix",
"",
"URL: http://rosalind.info/problems/ba3c/",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA3c(filename string) {
BA3cDescription()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
// Trim each line and there are your contigs
for i, line := range lines {
lines[i] = strings.Trim(line, " ")
}
g, err := rosa.OverlapGraph(lines)
if err != nil {
log.Fatalf("Error when calling ReconstructGenomeFromPath()")
}
fmt.Println("")
fmt.Printf("Computed result from input file: %s\n", filename)
fmt.Println(g.String())
}

9
chapter3/chapter3_test.go

@ -0,0 +1,9 @@ @@ -0,0 +1,9 @@
package rosalindchapter3
import "testing"
func TestChapter03(t *testing.T) {
BA3a("for_real/rosalind_ba3a.txt")
BA3b("for_real/rosalind_ba3b.txt")
BA3c("for_real/rosalind_ba3c.txt")
}

2
chapter3/for_real/rosalind_ba3a.txt

@ -0,0 +1,2 @@ @@ -0,0 +1,2 @@
50
GGGGGAAACTTACGGAGTACAAGAAGACCCGGCACAAAGAGAAAACACGTTGCTCGTTAGCTTAAGTTAAGACGTATCGGATATCTATCGTATCCTCGTAGTATTGCTAGCCACTTCACTGGACCAGGCTTACGTATTAGCCTTATGACCCCATTTCGTCTCCGCTGCTACAGCTGTGGAGTTGACGCGTCCGGTGGGCCCTCCGTTAGCAGGTCAGCTCATATTTTCGGCAAGAAATTACCCGGAACGGACCGAAAATGGGGTACAACATGCCCACCCACAACTTAGTACACAACGCTCAGCAAGTAGCTAACGACCGCTGCCGTCGTCAGTATTAGACGCACTTAACCGTACGGAATCCGTGAGTCCTGTTTCCGCCGATCGAATTACGCGCCCGGGTCGTGGGTCCAAAGGTGGCCGATCTCACGTACTGGTGAGTCGCGCGGTCACTTGGCTGTGAGGTCCACCGGCGGCCACAGTAATCTCTGGTGCACCCAGAATCGAGTCTGGATTGTGCACAAAGCTGCCCGCCTCTATTTCTCGGACCTGGCAGAACGCAACGGATGGGTTGAAGATTGGGCCGGTTCCGATGCCCCAAAGTACCCACATTTACTAGGGTGAGGCTGTTCTTTTGAGAGTAGAGACGAAAGCACCCCGACGTAACTGCTGCACACGGGGCTGCTCGGGATACTGTGCCGGAACTAGCGAGGCTCTACCCTCATCGGAAACCAGGCCTCATAATTCTTACAGCGTACTGTGTACTCCACAAGGAGCTGACCAGACATTCCACGTCCATGGATTCGGCTCATGCATACCTCCCGATCCACTCCTGAGCACATTGGATGGACACTTGAACGATGTCCTTAGCGCACGAGACATCAATTCGTGACGGTAGATTGCTCTCACCCTGATGCGGGTAAGTCACGTATTACCCGGCGTGCGGTATGTAGTAATACAGCTATCTACAACAAATGCAACCCGGCAGGTCTCCCATAGACCA

4976
chapter3/for_real/rosalind_ba3b.txt

File diff suppressed because it is too large Load Diff

981
chapter3/for_real/rosalind_ba3c.txt

@ -0,0 +1,981 @@ @@ -0,0 +1,981 @@
TGTACCTCGGGCTTTAAGGT
CGGAACGGGAATCCTGGGAG
ACTAGCTGCACTATCGGTTT
CGATCTCCAGGAAGATGTTG
AGCGGGAACTGCTGCAGGGC
GGGCTCCCGCGCCAGTGCCC
AATGCCTTTCTTTAGGCAGC
AGAAACCGAAATCGTGCCCC
TCATGGTTACAAAATTTACG
CCAGGTCAAAGGGAGCTATA
GCGCCCTGGTAAGTTACGCA
GACAATGGCCCGTGTGAATG
CGACTAGATAGATCCAATTG
ATTTTGGACCGCTGTCCTTC
CGACGTTCAGCGAAGAACGA
ACAGCCATCAAAGGAGAGCC
AGCAACAATAATTACGTCAC
AAATGGTCGAAACTAGCTGC
CAACCTGAATGTCGGGTCCG
AAAATTTTGGACCGCTGTCC
TGGTATCTCAAGAACCCTCT
GGAGGGCCAACCTGAATGTC
ACTTGTAACCCGTCGGAACG
GCCATTTGTGCCTTTTTGGT
GAACCCTCTCTCTACAGGCT
GTGCCTTTTTGGTGTGGAAC
AAGGTCCCACACAGCGTGCC
TCTCTCTACAGGCTAGTCAT
CTGCTGGAAACGACTAGATA
GCTTTAAGGTTAGGGGGTAA
AGCGAGGCCCTTGACACGAA
GATTGAATAGAGATATTCGA
CTCTCTCTACAGGCTAGTCA
CATCTTTTTGCCCATTATAA
CTGCAGGGCTCCCGCGCCAG
CATTAACACGACGTTCAGCG
TTCAGCGCCCTGGTAAGTTA
GAATGTGGAAATCGATCGTA
ATTCAGCGCCCTGGTAAGTT
TTTGGCCGAACCAGGTCAAA
ATCGGAGGAGTAGTTGGTAC
CGAAATCGTGCCCCATGTTC
GGTTGTGATCAGCGGCCGGG
GCAGCATTCAGCGCCCTGGT
GAGGTCCATTATTAACACGG
CCCATTATAATCCACATGGC
GTGATCAGCGGCCGGGACAA
GGGCCAACCTGAATGTCGGG
CGGAGGAGTAGTTGGTACCA
GCGAAGAACGAATACGACAG
GTCACCATTCCCAGCAAATG
CAACTTGTAACCCGTCGGAA
CCCGTGTGAATGTGGAAATC
ACAGAGCGAGGCCCTTGACA
TGTCCTTCCAGTGCTCGCGG
GCAATTACGAATGACCAGGC
AGAACGAATACGACAGAGCG
CTAGTCATGCGGGCTCGCAC
ATCCAGGAGAGATTGAATAG
GAGAGTCACATACACAATGC
TCCCCTGGAAGCGCGCTCCC
GTTGGTACCAACGCAAGAGG
AAATCGATCGTATACCGCAG
CAGCCATCAAAGGAGAGCCC
GTCCATTATTAACACGGGCT
TTAGGCAGCATTCAGCGCCC
GATCAGCGGCCGGGACAATG
ATTACGAATGACCAGGCTCA
GAGCCCCTTGCGCTGTATCG
TGGTCACTTGCTGCTGGAAA
AAAATTTACGTCCGAGCTAT
GGAGTAGTTGGTACCAACGC
GAAACGACTAGATAGATCCA
GTGCATCCAGGAGAGATTGA
GCCGAAATCACGCGCATTGT
TAGCAACAATAATTACGTCA
CTCAAGAACCCTCTCTCTAC
GAATCCTGGGAGGGCCAACC
TGCCCCATGTTCGCGTTTAA
CAATGCGGCTGTCGATCTCC
GGGAGCTATAACAGCCATCA
TACGAATGACCAGGCTCAAT
GACATCAAGGTCCCACACAG
CCTTCCAGTGCTCGCGGGGT
TTTCTTTAGGCAGCATTCAG
TCCCAGCAAATGCCTTTCTT
ACTGCTGCAGGGCTCCCGCG
CGTTCAGCGAAGAACGAATA
GGAACGGGAATCCTGGGAGG
GGCTAGTCATGCGGGCTCGC
GGTCGAAACTAGCTGCACTA
TCGGCAAAGTTAGAGCGGGA
CAGGTCAAAGGGAGCTATAA
ATAGCAACAATAATTACGTC
TAATTACGTCACCATTCCCA
GCAAAAATTTTGGACCGCTG
GGTCAAAGGGAGCTATAACA
TAGCTTCACAAATGGTCGAA
GGAACTGCTGCAGGGCTCCC
CACTTGCTGCTGGAAACGAC
CCTGAATGTCGGGTCCGAGT
AGGTTAGGGGGTAAGGCCAT
CCAGCAAATGCCTTTCTTTA
GGAGAGCCCCTTGCGCTGTA
CCCATTCTTCCCCTGGAAGC
CCATTCCCAGCAAATGCCTT
TCGAGAGTCACATACACAAT
AACCCTCTCTCTACAGGCTA
CATGTTCGCGTTTAAGATGA
GATCCAATTGGCCGAAATCA
GTAGTTGGTACCAACGCAAG
AACGGGAATCCTGGGAGGGC
TGGAAATCGATCGTATACCG
ACCGCTGTCCTTCCAGTGCT
CGAGCTATAGCAGCAAAAAT
AGCCCCTTGCGCTGTATCGT
GATATTCGAGAGTCACATAC
CGAGTTTGGCCGAACCAGGT
ACCAGGTCAAAGGGAGCTAT
CAAAGGTGGTTGTGATCAGC
TCAGCGGCCGGGACAATGGC
GGTAAGGCCATTTGTGCCTT
GAAGTTCGGCAAAGTTAGAG
AAAAATGGTATCTCAAGAAC
AAACGACTAGATAGATCCAA
ATCGTAGCTTCACAAATGGT
TGAGTGACATCAAGGTCCCA
CATTCTTCCCCTGGAAGCGC
GGTCCCACACAGCGTGCCAT
CAATTACGAATGACCAGGCT
CTTGTAACCCGTCGGAACGG
TGATCAGCGGCCGGGACAAT
ATCCACATGGCTATAGCAAC
GGCTGTCGATCTCCAGGAAG
CACATGGCTATAGCAACAAT
TTAAGATGAGTGACATCAAG
GAGATTGAATAGAGATATTC
CCTGGGAGGGCCAACCTGAA
GGGCTTTAAGGTTAGGGGGT
AGAGCGAGGCCCTTGACACG
CCCACACAGCGTGCCATTCA
TGGCTATAGCAACAATAATT
CCTGGTAAGTTACGCATTAA
GCCCTGGTAAGTTACGCATT
CCCGTCGGAACGGGAATCCT
ACGTTCAGCGAAGAACGAAT
AAGTTACGCATTAACACGAC
GCATCCAGGAGAGATTGAAT
GGTTTGGTCACTTGCTGCTG
CCCGCGCCAGTGCCCATCTC
CCGCTGTCCTTCCAGTGCTC
GTGTACCTCGGGCTTTAAGG
ACAAAGGTGGTTGTGATCAG
CAGTGCTCGCGGGGTGTACC
AATCGTGCCCCATGTTCGCG
CCTTTATCTCGTGCATCCAG
GGGAACTGCTGCAGGGCTCC
GAAATCGTGCCCCATGTTCG
TAACACGGGCTCATCTTTTT
AACACTAAACAAAGGTGGTT
ATGTTCGCGTTTAAGATGAG
GCTGCTGGAAACGACTAGAT
GTATCTCAAGAACCCTCTCT
CCAACGCAAGAGGTCCATTA
CACGGGCTCATCTTTTTGCC
TGCTGCTGGAAACGACTAGA
TTGTTGTCGTCATGGTTACA
CGCACGCGGTCAACTTGTAA
GCCGGGACAATGGCCCGTGT
GGAATCCTGGGAGGGCCAAC
TAGATAGATCCAATTGGCCG
AATGGCCCGTGTGAATGTGG
CTTCACAAATGGTCGAAACT
AGAGCGGGAACTGCTGCAGG
TCCAGTGCTCGCGGGGTGTA
ACACAGCGTGCCATTCATCC
TCTTTAGGCAGCATTCAGCG
CCAGTGCCCATCTCGGCGAA
TACAGGCTAGTCATGCGGGC
GTCGGAACGGGAATCCTGGG
TCGGAACGGGAATCCTGGGA
GAGTAGTTGGTACCAACGCA
CCCTGGAAGCGCGCTCCCGC
AACAATAATTACGTCACCAT
CTGAATGTCGGGTCCGAGTT
CGCGTTTAAGATGAGTGACA
TCGGGCTTTAAGGTTAGGGG
ATACCGCAGAAGTTCGGCAA
ACGAATACGACAGAGCGAGG
TGGTTACAAAATTTACGTCC
CGCATTAACACGACGTTCAG
GCGAAGAAACCGAAATCGTG
ACTAAACAAAGGTGGTTGTG
TTGCCCATTATAATCCACAT
CCGGGACAATGGCCCGTGTG
TCCAGGAGAGATTGAATAGA
TAATCCACATGGCTATAGCA
AATGGTCGAAACTAGCTGCA
AAGAACGAATACGACAGAGC
AATCCTGGGAGGGCCAACCT
CGAAGAACGAATACGACAGA
AATGCGGCTGTCGATCTCCA
AGATGAGTGACATCAAGGTC
ATCTCAAGAACCCTCTCTCT
GTGCCCATCTCGGCGAAAAA
ATAGATCCAATTGGCCGAAA
AGGCCCTTGACACGAACACT
CACAAATGGTCGAAACTAGC
ATTGTTGTCGTCATGGTTAC
TGTCGATCTCCAGGAAGATG
GTTCGGCAAAGTTAGAGCGG
GTCAAAGGGAGCTATAACAG
CACGACGTTCAGCGAAGAAC
CGTGCCCCATGTTCGCGTTT
TGGACCGCTGTCCTTCCAGT
TCTCGTGCATCCAGGAGAGA
AGTGACATCAAGGTCCCACA
CTCTACAGGCTAGTCATGCG
CAGCGGCCGGGACAATGGCC
CGTCCGAGCTATAGCAGCAA
GCGGCCGGGACAATGGCCCG
GAAGAAACCGAAATCGTGCC
AGGAGAGCCCCTTGCGCTGT
CAAGGTCCCACACAGCGTGC
GTGCTCGCGGGGTGTACCTC
CTTCCCCTGGAAGCGCGCTC
GGCTCAATCGGAGGAGTAGT
GAACCCATTCTTCCCCTGGA
GTATACCGCAGAAGTTCGGC
GAATGACCAGGCTCAATCGG
ATAACAGCCATCAAAGGAGA
GAGCTATAACAGCCATCAAA
GGTCCGAGTTTGGCCGAACC
CCCCTTGCGCTGTATCGTAG
CTGGTAAGTTACGCATTAAC
CCTTGACACGAACACTAAAC
AGATCCAATTGGCCGAAATC
AGTGCTCGCGGGGTGTACCT
GTCAACTTGTAACCCGTCGG
TACGCATTAACACGACGTTC
ACAAAATTTACGTCCGAGCT
GCAACAATAATTACGTCACC
CAATTGGCCGAAATCACGCG
ACCCGTCGGAACGGGAATCC
GGTTACAAAATTTACGTCCG
ACGACAGAGCGAGGCCCTTG
ATTGAATAGAGATATTCGAG
TCATGCGGGCTCGCACGCGG
GCAAAGTTAGAGCGGGAACT
TGTATCGTAGCTTCACAAAT
CCCTCTCTCTACAGGCTAGT
TCTACAGGCTAGTCATGCGG
GATAGATCCAATTGGCCGAA
CCGAAATCGTGCCCCATGTT
TTTAAGGTTAGGGGGTAAGG
CCGCCTTTATCTCGTGCATC
GGCCGGGACAATGGCCCGTG
AATGTGGAAATCGATCGTAT
ACACGGGCTCATCTTTTTGC
GAATGTCGGGTCCGAGTTTG
TGGAAGCGCGCTCCCGCCTT
GCTCCCGCCTTTATCTCGTG
CGCCCTGGTAAGTTACGCAT
GTCGTCATGGTTACAAAATT
GCGGTCAACTTGTAACCCGT
ATCGTGCCCCATGTTCGCGT
GAAACCGAAATCGTGCCCCA
TGTGGTGCAATTACGAATGA
CGAAATCACGCGCATTGTTG
AACCTGAATGTCGGGTCCGA
CGCCTTTATCTCGTGCATCC
GTTTAAGATGAGTGACATCA
ATAATTACGTCACCATTCCC
AGGAGTAGTTGGTACCAACG
GGTTAGGGGGTAAGGCCATT
TCATCTTTTTGCCCATTATA
ACAAATGGTCGAAACTAGCT
TCGCGGGGTGTACCTCGGGC
CGGGCTCATCTTTTTGCCCA
GCCATCAAAGGAGAGCCCCT
TTTACGTCCGAGCTATAGCA
TGCGCTGTATCGTAGCTTCA
ACGACTAGATAGATCCAATT
ACAATGCGGCTGTCGATCTC
GTGAATGTGGAAATCGATCG
CCGAAATCACGCGCATTGTT
TGGTGCGAAGAAACCGAAAT
CTATCGGTTTGGTCACTTGC
CCCAGCAAATGCCTTTCTTT
ACCCTCTCTCTACAGGCTAG
GCTGCAGGGCTCCCGCGCCA
TGCGGCTGTCGATCTCCAGG
CCAGGCTCAATCGGAGGAGT
TTCGAGAGTCACATACACAA
AATTACGTCACCATTCCCAG
GGTAAGTTACGCATTAACAC
CTGTGGTGCAATTACGAATG
TGTTGGTGCGAAGAAACCGA
AGGGCCAACCTGAATGTCGG
AAGGTTAGGGGGTAAGGCCA
AATTTACGTCCGAGCTATAG
ATACACAATGCGGCTGTCGA
TCGTGCCCCATGTTCGCGTT
CAGAAGTTCGGCAAAGTTAG
TTGAATAGAGATATTCGAGA
TCAGCGCCCTGGTAAGTTAC
AAATCGTGCCCCATGTTCGC
ATGCCTTTCTTTAGGCAGCA
TGTCGTCATGGTTACAAAAT
CCATGTTCGCGTTTAAGATG
GCATTCAGCGCCCTGGTAAG
CCCATGTTCGCGTTTAAGAT
ATCCTGGGAGGGCCAACCTG
AACACGACGTTCAGCGAAGA
GAATAGAGATATTCGAGAGT
GTACCTCGGGCTTTAAGGTT
CAGGCTCAATCGGAGGAGTA
TTTAGGCAGCATTCAGCGCC
ATTCTTCCCCTGGAAGCGCG
ACGAATGACCAGGCTCAATC
GGCTCATCTTTTTGCCCATT
GAGCGAGGCCCTTGACACGA
AATGACCAGGCTCAATCGGA
AAGAGGTCCATTATTAACAC
GGGTGTACCTCGGGCTTTAA
TTTTTGGTGTGGAACCCATT
TGACCAGGCTCAATCGGAGG
ACACTAAACAAAGGTGGTTG
GGGGTGTACCTCGGGCTTTA
ACGTCACCATTCCCAGCAAA
CGAAACTAGCTGCACTATCG
CCTCGGGCTTTAAGGTTAGG
TTAACACGGGCTCATCTTTT
AAATGGTATCTCAAGAACCC
TGACATCAAGGTCCCACACA
GAGGCCCTTGACACGAACAC
CTTGCGCTGTATCGTAGCTT
GTCCGAGTTTGGCCGAACCA
GGAGGAGTAGTTGGTACCAA
TTCCAGTGCTCGCGGGGTGT
GAACGAATACGACAGAGCGA
CGGGTCCGAGTTTGGCCGAA
AATTACGAATGACCAGGCTC
GTCGGGTCCGAGTTTGGCCG
TATAGCAGCAAAAATTTTGG
AGTAGTTGGTACCAACGCAA
TGCTGGAAACGACTAGATAG
TGGCCCGTGTGAATGTGGAA
TGTTCGCGTTTAAGATGAGT
AGCGGCCGGGACAATGGCCC
ATTCCCAGCAAATGCCTTTC
TACCTCGGGCTTTAAGGTTA
AGTCATGCGGGCTCGCACGC
GGCAGCATTCAGCGCCCTGG
CGAGAGTCACATACACAATG
GTGTGGAACCCATTCTTCCC
CATCCAGGAGAGATTGAATA
GGCCGAAATCACGCGCATTG
TTGACACGAACACTAAACAA
TCTCAAGAACCCTCTCTCTA
TGCATCCAGGAGAGATTGAA
ATCTCCAGGAAGATGTTGGT
CACATACACAATGCGGCTGT
TTGGTGCGAAGAAACCGAAA
AGTGCCCATCTCGGCGAAAA
GCGCTCCCGCCTTTATCTCG
TAGATCCAATTGGCCGAAAT
GGCCCTTGACACGAACACTA
CAAAAATTTTGGACCGCTGT
CATCAAGGTCCCACACAGCG
CAGCAAATGCCTTTCTTTAG
CATACACAATGCGGCTGTCG
ATCGTATACCGCAGAAGTTC
GAATACGACAGAGCGAGGCC
TCAACTTGTAACCCGTCGGA
TGTAACCCGTCGGAACGGGA
TGCAATTACGAATGACCAGG
TCGTAGCTTCACAAATGGTC
TCCATTATTAACACGGGCTC
TTGGCCGAAATCACGCGCAT
CCTCTCTCTACAGGCTAGTC
ATTGGCCGAAATCACGCGCA
GCTAGTCATGCGGGCTCGCA
AGGTCCCACACAGCGTGCCA
GGAAATCGATCGTATACCGC
CGCGCCAGTGCCCATCTCGG
TCGATCGTATACCGCAGAAG
CCAATTGGCCGAAATCACGC
ATACGACAGAGCGAGGCCCT
CGCTCCCGCCTTTATCTCGT
TGGCCGAAATCACGCGCATT
AATAATTACGTCACCATTCC
CAGCGAAGAACGAATACGAC
TACGACAGAGCGAGGCCCTT
GAAGCGCGCTCCCGCCTTTA
TCTTCCCCTGGAAGCGCGCT
GAGTGACATCAAGGTCCCAC
GCGGGCTCGCACGCGGTCAA
CGGCTGTCGATCTCCAGGAA
CCGTCGGAACGGGAATCCTG
GTAAGTTACGCATTAACACG
ATCTTTTTGCCCATTATAAT
AATACGACAGAGCGAGGCCC
GACACGAACACTAAACAAAG
GTGTGAATGTGGAAATCGAT
TGCACTATCGGTTTGGTCAC
GTCATGCGGGCTCGCACGCG
ACCCATTCTTCCCCTGGAAG
ATCTCGTGCATCCAGGAGAG
AAGTTAGAGCGGGAACTGCT
CTATAGCAACAATAATTACG
GTGCAATTACGAATGACCAG
CAGGCTAGTCATGCGGGCTC
ATAATCCACATGGCTATAGC
GCTATAACAGCCATCAAAGG
AGATGTTGGTGCGAAGAAAC
ATGGTCGAAACTAGCTGCAC
GGAGAGATTGAATAGAGATA
CGACAGAGCGAGGCCCTTGA
TCGCACGCGGTCAACTTGTA
GAAATCACGCGCATTGTTGT
TATACCGCAGAAGTTCGGCA
CAGCATTCAGCGCCCTGGTA
TTTTGGTGTGGAACCCATTC
TATAATCCACATGGCTATAG
ATCACGCGCATTGTTGTCGT
ATTATAATCCACATGGCTAT
TGTCGGGTCCGAGTTTGGCC
AAGTTCGGCAAAGTTAGAGC
ACGGGAATCCTGGGAGGGCC
ATTTACGTCCGAGCTATAGC
CTGGAAGCGCGCTCCCGCCT
GCAAATGCCTTTCTTTAGGC
TTACGCATTAACACGACGTT
AGGTGGTTGTGATCAGCGGC
GCCCGTGTGAATGTGGAAAT
GTCCGAGCTATAGCAGCAAA
AGGTCAAAGGGAGCTATAAC
GCGGCTGTCGATCTCCAGGA
TTGCGCTGTATCGTAGCTTC
AAAGTTAGAGCGGGAACTGC
TTCCCCTGGAAGCGCGCTCC
CCCCATGTTCGCGTTTAAGA
TATCTCAAGAACCCTCTCTC
CCGCGCCAGTGCCCATCTCG
GGAACCCATTCTTCCCCTGG
ATAGCAGCAAAAATTTTGGA
CGTTTAAGATGAGTGACATC
TATAACAGCCATCAAAGGAG
CCAGGAAGATGTTGGTGCGA
TACGTCACCATTCCCAGCAA
GCCTTTCTTTAGGCAGCATT
AACAGCCATCAAAGGAGAGC
GCATTAACACGACGTTCAGC
CTCCCGCCTTTATCTCGTGC
CGTGCATCCAGGAGAGATTG
CTGGAAACGACTAGATAGAT
GCATTGTTGTCGTCATGGTT
CGGCCGGGACAATGGCCCGT
TTATTAACACGGGCTCATCT
TAGCAGCAAAAATTTTGGAC
TCAAGAACCCTCTCTCTACA
AGTCACATACACAATGCGGC
CAACGCAAGAGGTCCATTAT
CGGGCTCGCACGCGGTCAAC
CTCGCGGGGTGTACCTCGGG
ATGTGGAAATCGATCGTATA
AGCTATAACAGCCATCAAAG
TTAAGGTTAGGGGGTAAGGC
TCCAATTGGCCGAAATCACG
AACGACTAGATAGATCCAAT
GGCCAACCTGAATGTCGGGT
CCTTTCTTTAGGCAGCATTC
AGTTGGTACCAACGCAAGAG
TGTGAATGTGGAAATCGATC
CATGGTTACAAAATTTACGT
TTGCTGCTGGAAACGACTAG
CATGCGGGCTCGCACGCGGT
CAACAATAATTACGTCACCA
GAGAGATTGAATAGAGATAT
GTGCGAAGAAACCGAAATCG
GGGCTCGCACGCGGTCAACT
GCAAGAGGTCCATTATTAAC
GTCACTTGCTGCTGGAAACG
GGCCGAACCAGGTCAAAGGG
CGCGCATTGTTGTCGTCATG
CATTATAATCCACATGGCTA
GGTGCAATTACGAATGACCA
GGGAATCCTGGGAGGGCCAA
CATCTCGGCGAAAAATGGTA
CGTGTGAATGTGGAAATCGA
GCTATAGCAACAATAATTAC
CCGCAGAAGTTCGGCAAAGT
ACTAGATAGATCCAATTGGC
AAACAAAGGTGGTTGTGATC
TCGCGTTTAAGATGAGTGAC
TCGGGTCCGAGTTTGGCCGA
AAAATGGTATCTCAAGAACC
GGACAATGGCCCGTGTGAAT
GACCAGGCTCAATCGGAGGA
CAATAATTACGTCACCATTC
ATCGGTTTGGTCACTTGCTG
TCGGCGAAAAATGGTATCTC
AACCCATTCTTCCCCTGGAA
TCCCGCCTTTATCTCGTGCA
GATCTCCAGGAAGATGTTGG
GAAAAATGGTATCTCAAGAA
GGCCCGTGTGAATGTGGAAA
GTCCTTCCAGTGCTCGCGGG
TTGTAACCCGTCGGAACGGG
ACGCAAGAGGTCCATTATTA
GCTCGCGGGGTGTACCTCGG
TAAGGTTAGGGGGTAAGGCC
CCCCTGGAAGCGCGCTCCCG
GCTATAGCAGCAAAAATTTT
TCTCGGCGAAAAATGGTATC
ATCCAATTGGCCGAAATCAC
ATGGCCCGTGTGAATGTGGA
GCCCCTTGCGCTGTATCGTA
TTGTCGTCATGGTTACAAAA
AACCGAAATCGTGCCCCATG
CCAGTGCTCGCGGGGTGTAC
CCATTATAATCCACATGGCT
ATGTTGGTGCGAAGAAACCG
ATGGTTACAAAATTTACGTC
AGTTTGGCCGAACCAGGTCA
GTACCAACGCAAGAGGTCCA
GAGGAGTAGTTGGTACCAAC
CAAAGTTAGAGCGGGAACTG
TTAGGGGGTAAGGCCATTTG
GAAGATGTTGGTGCGAAGAA
ATGGCTATAGCAACAATAAT
AAATCACGCGCATTGTTGTC
TATTCGAGAGTCACATACAC
AAATGCCTTTCTTTAGGCAG
CGAATACGACAGAGCGAGGC
AGATAGATCCAATTGGCCGA
TGACACGAACACTAAACAAA
GCCAACCTGAATGTCGGGTC
CTGCTGCAGGGCTCCCGCGC
CTCGTGCATCCAGGAGAGAT
TTTGCCCATTATAATCCACA
ACGAACACTAAACAAAGGTG
TGCAGGGCTCCCGCGCCAGT
AGAAGTTCGGCAAAGTTAGA
TAGAGATATTCGAGAGTCAC
CTGCACTATCGGTTTGGTCA
GGCCATTTGTGCCTTTTTGG
CACACAGCGTGCCATTCATC
CAAATGGTCGAAACTAGCTG
TTACAAAATTTACGTCCGAG
AACCAGGTCAAAGGGAGCTA
GCTCAATCGGAGGAGTAGTT
CCCTGGTAAGTTACGCATTA
CCGAACCAGGTCAAAGGGAG
TCCTTCCAGTGCTCGCGGGG
GCGGGAACTGCTGCAGGGCT
ACACGAACACTAAACAAAGG
GGGACAATGGCCCGTGTGAA
TCAATCGGAGGAGTAGTTGG
TAAGATGAGTGACATCAAGG
CCACATGGCTATAGCAACAA
TGAATAGAGATATTCGAGAG
AGAGATTGAATAGAGATATT
ACGGGCTCATCTTTTTGCCC
CTTTTTGCCCATTATAATCC
CTCGCACGCGGTCAACTTGT
CTTCCAGTGCTCGCGGGGTG
CTCATCTTTTTGCCCATTAT
AGGCCATTTGTGCCTTTTTG
AATTTTGGACCGCTGTCCTT
ATTTGTGCCTTTTTGGTGTG
ACACGACGTTCAGCGAAGAA
TTGGTCACTTGCTGCTGGAA
GTATCGTAGCTTCACAAATG
ACGCGCATTGTTGTCGTCAT
CCCTTGCGCTGTATCGTAGC
GGAAGCGCGCTCCCGCCTTT
CGTATACCGCAGAAGTTCGG
AAAGGAGAGCCCCTTGCGCT
GATGAGTGACATCAAGGTCC
CGCTGTATCGTAGCTTCACA
TCACATACACAATGCGGCTG
AAGAACCCTCTCTCTACAGG
CAGAGCGAGGCCCTTGACAC
CCACACAGCGTGCCATTCAT
GGTCAACTTGTAACCCGTCG
TATCTCGTGCATCCAGGAGA
AAGGAGAGCCCCTTGCGCTG
GGTCCATTATTAACACGGGC
GTCGATCTCCAGGAAGATGT
TTAGAGCGGGAACTGCTGCA
AAAGGTGGTTGTGATCAGCG
TGGTTGTGATCAGCGGCCGG
GTTAGAGCGGGAACTGCTGC
CCAACCTGAATGTCGGGTCC
GTTCAGCGAAGAACGAATAC
CTCCCGCGCCAGTGCCCATC
TCGTGCATCCAGGAGAGATT
CGCGGGGTGTACCTCGGGCT
CACTAAACAAAGGTGGTTGT
CCCTTGACACGAACACTAAA
CGGTTTGGTCACTTGCTGCT
ACTATCGGTTTGGTCACTTG
TGGAAACGACTAGATAGATC
GTGGTGCAATTACGAATGAC
TGGTGCAATTACGAATGACC
GACGTTCAGCGAAGAACGAA
AATCGATCGTATACCGCAGA
TTTGTGCCTTTTTGGTGTGG
CATTCAGCGCCCTGGTAAGT
GCGCCAGTGCCCATCTCGGC
GTAACCCGTCGGAACGGGAA
AACTGCTGCAGGGCTCCCGC
AGAGATATTCGAGAGTCACA
GTCCCACACAGCGTGCCATT
AGCTATAGCAGCAAAAATTT
TTTTGGACCGCTGTCCTTCC
TCGGAGGAGTAGTTGGTACC
CTAGCTGCACTATCGGTTTG
TCCAGGAAGATGTTGGTGCG
AGATTGAATAGAGATATTCG
CAGGAAGATGTTGGTGCGAA
TCAAAGGAGAGCCCCTTGCG
CCATTATTAACACGGGCTCA
AAGGGAGCTATAACAGCCAT
CTATAACAGCCATCAAAGGA
AAGGTGGTTGTGATCAGCGG
GAGGGCCAACCTGAATGTCG
GCGAAAAATGGTATCTCAAG
TTATAATCCACATGGCTATA
AGGCTCAATCGGAGGAGTAG
TGCGGGCTCGCACGCGGTCA
TCACTTGCTGCTGGAAACGA
AGGGAGCTATAACAGCCATC
GGACCGCTGTCCTTCCAGTG
TTCACAAATGGTCGAAACTA
TAACACGACGTTCAGCGAAG
CCGTGTGAATGTGGAAATCG
CGGGCTTTAAGGTTAGGGGG
TTGTGATCAGCGGCCGGGAC
GTGACATCAAGGTCCCACAC
TGGCCGAACCAGGTCAAAGG
GACCGCTGTCCTTCCAGTGC
TTCTTCCCCTGGAAGCGCGC
GTCGAAACTAGCTGCACTAT
TACGTCCGAGCTATAGCAGC
CCTTGCGCTGTATCGTAGCT
CGGTCAACTTGTAACCCGTC
CAGTGCCCATCTCGGCGAAA
GCTTCACAAATGGTCGAAAC
GGTGGTTGTGATCAGCGGCC
TGTGCCTTTTTGGTGTGGAA
GGCTCCCGCGCCAGTGCCCA
TGTGGAACCCATTCTTCCCC
TCTTTTTGCCCATTATAATC
CTTTATCTCGTGCATCCAGG
AACGAATACGACAGAGCGAG
GATGTTGGTGCGAAGAAACC
TGCCCATTATAATCCACATG
AATCGGAGGAGTAGTTGGTA
GGGTAAGGCCATTTGTGCCT
GTTTGGCCGAACCAGGTCAA
TCTCTACAGGCTAGTCATGC
GAAGAACGAATACGACAGAG
CGCGCTCCCGCCTTTATCTC
AATCCACATGGCTATAGCAA
CTTTTTGGTGTGGAACCCAT
AGCGCGCTCCCGCCTTTATC
GAGCTATAGCAGCAAAAATT
AGCAAAAATTTTGGACCGCT
TAGTTGGTACCAACGCAAGA
CAAAGGAGAGCCCCTTGCGC
ACAGGCTAGTCATGCGGGCT
GAGAGCCCCTTGCGCTGTAT
GTTTGGTCACTTGCTGCTGG
ATTATTAACACGGGCTCATC
GAAACTAGCTGCACTATCGG
GCTGTCCTTCCAGTGCTCGC
TAAACAAAGGTGGTTGTGAT
CGTCACCATTCCCAGCAAAT
TCCGAGCTATAGCAGCAAAA
GGTGTGGAACCCATTCTTCC
ACCGCAGAAGTTCGGCAAAG
GGTATCTCAAGAACCCTCTC
CGCTGTCCTTCCAGTGCTCG
CATTTGTGCCTTTTTGGTGT
CTCTCTACAGGCTAGTCATG
CAGCGCCCTGGTAAGTTACG
GCCTTTATCTCGTGCATCCA
ACTTGCTGCTGGAAACGACT
AACCCGTCGGAACGGGAATC
CCGAGTTTGGCCGAACCAGG
CTGGGAGGGCCAACCTGAAT
GGTCACTTGCTGCTGGAAAC
CTCCAGGAAGATGTTGGTGC
CGCGGTCAACTTGTAACCCG
CACTATCGGTTTGGTCACTT
GTCATGGTTACAAAATTTAC
AAGATGTTGGTGCGAAGAAA
GCAGGGCTCCCGCGCCAGTG
CTGTCGATCTCCAGGAAGAT
TTCTTTAGGCAGCATTCAGC
CACGCGGTCAACTTGTAACC
TGCCTTTTTGGTGTGGAACC
GAAATCGATCGTATACCGCA
AGGCAGCATTCAGCGCCCTG
GCACGCGGTCAACTTGTAAC
CAGGGCTCCCGCGCCAGTGC
TTTAAGATGAGTGACATCAA
CACGCGCATTGTTGTCGTCA
GAGTTTGGCCGAACCAGGTC
ACCGAAATCGTGCCCCATGT
AGGTCCATTATTAACACGGG
CAAGAACCCTCTCTCTACAG
TTTGGTCACTTGCTGCTGGA
CTTGCTGCTGGAAACGACTA
ACACAATGCGGCTGTCGATC
GTTACGCATTAACACGACGT
AACTTGTAACCCGTCGGAAC
AGCCATCAAAGGAGAGCCCC
GCTGTCGATCTCCAGGAAGA
CAAAGGGAGCTATAACAGCC
TGGGAGGGCCAACCTGAATG
GACTAGATAGATCCAATTGG
TTGTGCCTTTTTGGTGTGGA
GCGTTTAAGATGAGTGACAT
CGGGGTGTACCTCGGGCTTT
GCGCGCTCCCGCCTTTATCT
ATGCGGCTGTCGATCTCCAG
CGCAGAAGTTCGGCAAAGTT
ATGACCAGGCTCAATCGGAG
CATCAAAGGAGAGCCCCTTG
ATAGAGATATTCGAGAGTCA
TGCTCGCGGGGTGTACCTCG
TTGGTACCAACGCAAGAGGT
AGCATTCAGCGCCCTGGTAA
AGGGCTCCCGCGCCAGTGCC
AACTAGCTGCACTATCGGTT
CAATCGGAGGAGTAGTTGGT
GTTCGCGTTTAAGATGAGTG
GTAGCTTCACAAATGGTCGA
CATTGTTGTCGTCATGGTTA
GGGGTAAGGCCATTTGTGCC
CGAAAAATGGTATCTCAAGA
ATCAGCGGCCGGGACAATGG
CTGTCCTTCCAGTGCTCGCG
GGGAGGGCCAACCTGAATGT
TCACGCGCATTGTTGTCGTC
GATCGTATACCGCAGAAGTT
AGCGCCCTGGTAAGTTACGC
TAAGTTACGCATTAACACGA
CCGAGCTATAGCAGCAAAAA
TAGAGCGGGAACTGCTGCAG
ACATCAAGGTCCCACACAGC
ATCGATCGTATACCGCAGAA
GTGGAACCCATTCTTCCCCT
ATTAACACGACGTTCAGCGA
CGGCGAAAAATGGTATCTCA
TAACAGCCATCAAAGGAGAG
TGCCCATCTCGGCGAAAAAT
AATAGAGATATTCGAGAGTC
AAGCGCGCTCCCGCCTTTAT
CTTTAGGCAGCATTCAGCGC
GCACTATCGGTTTGGTCACT
GGCTTTAAGGTTAGGGGGTA
GAGATATTCGAGAGTCACAT
TCAGCGAAGAACGAATACGA
AAATTTACGTCCGAGCTATA
GTGGAAATCGATCGTATACC
GCCCATTATAATCCACATGG
TAAGGCCATTTGTGCCTTTT
CTTTCTTTAGGCAGCATTCA
CTGTATCGTAGCTTCACAAA
TTGGTGTGGAACCCATTCTT
GGAGCTATAACAGCCATCAA
TTCGCGTTTAAGATGAGTGA
AGGGGGTAAGGCCATTTGTG
TGAATGTGGAAATCGATCGT
ACGCATTAACACGACGTTCA
ATTAACACGGGCTCATCTTT
AGGAAGATGTTGGTGCGAAG
GAGCGGGAACTGCTGCAGGG
AATGGTATCTCAAGAACCCT
CTACAGGCTAGTCATGCGGG
CCAGGAGAGATTGAATAGAG
CGAAGAAACCGAAATCGTGC
GAACTGCTGCAGGGCTCCCG
ACCAGGCTCAATCGGAGGAG
TATCGGTTTGGTCACTTGCT
TTTTGCCCATTATAATCCAC
TTCAGCGAAGAACGAATACG
GGCTCGCACGCGGTCAACTT
TCACCATTCCCAGCAAATGC
TGCGAAGAAACCGAAATCGT
AGAGGTCCATTATTAACACG
GGCGAAAAATGGTATCTCAA
CACAATGCGGCTGTCGATCT
TACCAACGCAAGAGGTCCAT
TACACAATGCGGCTGTCGAT
CACGAACACTAAACAAAGGT
GAACACTAAACAAAGGTGGT
CTCAATCGGAGGAGTAGTTG
CGCATTGTTGTCGTCATGGT
AGATATTCGAGAGTCACATA
ACCTGAATGTCGGGTCCGAG
TTACGAATGACCAGGCTCAA
GCCCCATGTTCGCGTTTAAG
AGCTTCACAAATGGTCGAAA
GAACGGGAATCCTGGGAGGG
CGGGAACTGCTGCAGGGCTC
TGGTCGAAACTAGCTGCACT
TCGATCTCCAGGAAGATGTT
CCATTCTTCCCCTGGAAGCG
AAGATGAGTGACATCAAGGT
CATTCCCAGCAAATGCCTTT
TCCACATGGCTATAGCAACA
GCTCCCGCGCCAGTGCCCAT
CGGCAAAGTTAGAGCGGGAA
CTTTAAGGTTAGGGGGTAAG
CCCATCTCGGCGAAAAATGG
CCATTTGTGCCTTTTTGGTG
GCTGTATCGTAGCTTCACAA
CCTGGAAGCGCGCTCCCGCC
AAACTAGCTGCACTATCGGT
AGAACCCTCTCTCTACAGGC
CGGGACAATGGCCCGTGTGA
CATTATTAACACGGGCTCAT
GCGCATTGTTGTCGTCATGG
CTCGGCGAAAAATGGTATCT
AACGCAAGAGGTCCATTATT
ACGACGTTCAGCGAAGAACG
GCTGGAAACGACTAGATAGA
ACATGGCTATAGCAACAATA
CGGGAATCCTGGGAGGGCCA
CAAATGCCTTTCTTTAGGCA
TCGTCATGGTTACAAAATTT
TAGGGGGTAAGGCCATTTGT
GCAGCAAAAATTTTGGACCG
TTTGGTGTGGAACCCATTCT
ACGTCCGAGCTATAGCAGCA
ATGTCGGGTCCGAGTTTGGC
CTAGATAGATCCAATTGGCC
GTTGGTGCGAAGAAACCGAA
CGCAAGAGGTCCATTATTAA
GGCTATAGCAACAATAATTA
AAGAAACCGAAATCGTGCCC
GCCAGTGCCCATCTCGGCGA
AAACCGAAATCGTGCCCCAT
GGGCTCATCTTTTTGCCCAT
GCCGAACCAGGTCAAAGGGA
TTATCTCGTGCATCCAGGAG
AAGGCCATTTGTGCCTTTTT
TGTGATCAGCGGCCGGGACA
GTCACATACACAATGCGGCT
GGCAAAGTTAGAGCGGGAAC
CGCCAGTGCCCATCTCGGCG
ACGCGGTCAACTTGTAACCC
GCTCATCTTTTTGCCCATTA
CGAACCAGGTCAAAGGGAGC
TATCGTAGCTTCACAAATGG
GAGTCACATACACAATGCGG
AGAGCCCCTTGCGCTGTATC
ATCAAGGTCCCACACAGCGT
TAGTCATGCGGGCTCGCACG
GTTAGGGGGTAAGGCCATTT
CAATGGCCCGTGTGAATGTG
GCGAGGCCCTTGACACGAAC
TCAAAGGGAGCTATAACAGC
CAAGAGGTCCATTATTAACA
TCTCCAGGAAGATGTTGGTG
TCCCGCGCCAGTGCCCATCT
AATCACGCGCATTGTTGTCG
ATCTCGGCGAAAAATGGTAT
AATGTCGGGTCCGAGTTTGG
TCAAGGTCCCACACAGCGTG
GGTACCAACGCAAGAGGTCC
CGTCATGGTTACAAAATTTA
GACAGAGCGAGGCCCTTGAC
GGAAGATGTTGGTGCGAAGA
AACACGGGCTCATCTTTTTG
CTATAGCAGCAAAAATTTTG
ATGGTATCTCAAGAACCCTC
TGTGGAAATCGATCGTATAC
TGCCTTTCTTTAGGCAGCAT
TCGGTTTGGTCACTTGCTGC
TGGTACCAACGCAAGAGGTC
CGAATGACCAGGCTCAATCG
TTGGACCGCTGTCCTTCCAG
TGTTGTCGTCATGGTTACAA
TAGGCAGCATTCAGCGCCCT
GCCCATCTCGGCGAAAAATG
ACAATAATTACGTCACCATT
TACAAAATTTACGTCCGAGC
AATTGGCCGAAATCACGCGC
TTTGGACCGCTGTCCTTCCA
GCGGGGTGTACCTCGGGCTT
GCAGAAGTTCGGCAAAGTTA
CCCGCCTTTATCTCGTGCAT
CAAAATTTACGTCCGAGCTA
CCATCTCGGCGAAAAATGGT
TGAATGTCGGGTCCGAGTTT
TCCTGGGAGGGCCAACCTGA
TCCGAGTTTGGCCGAACCAG
AGGCTAGTCATGCGGGCTCG
TATTAACACGGGCTCATCTT
GCTGCACTATCGGTTTGGTC
TATAGCAACAATAATTACGT
GTTACAAAATTTACGTCCGA
GGGTCCGAGTTTGGCCGAAC
GCCTTTTTGGTGTGGAACCC
AGGAGAGATTGAATAGAGAT
TGGTGTGGAACCCATTCTTC
ACAATGGCCCGTGTGAATGT
TAACCCGTCGGAACGGGAAT
AGTTACGCATTAACACGACG
CAGGAGAGATTGAATAGAGA
GCGCTGTATCGTAGCTTCAC
GCTCGCACGCGGTCAACTTG
GGTGCGAAGAAACCGAAATC
ATGCGGGCTCGCACGCGGTC
GGTGTACCTCGGGCTTTAAG
TACCGCAGAAGTTCGGCAAA
GTGCCCCATGTTCGCGTTTA
GTAAGGCCATTTGTGCCTTT
GCCCTTGACACGAACACTAA
GAACCAGGTCAAAGGGAGCT
TTACGTCCGAGCTATAGCAG
ATGAGTGACATCAAGGTCCC
ACCAACGCAAGAGGTCCATT
ATCAAAGGAGAGCCCCTTGC
ATTACGTCACCATTCCCAGC
TCACAAATGGTCGAAACTAG
ACATACACAATGCGGCTGTC
ACCATTCCCAGCAAATGCCT
AGCTGCACTATCGGTTTGGT
CAGCAAAAATTTTGGACCGC
GTGGTTGTGATCAGCGGCCG
TGGAACCCATTCTTCCCCTG
AGAGTCACATACACAATGCG
TAGCTGCACTATCGGTTTGG
CTAAACAAAGGTGGTTGTGA
AGTTAGAGCGGGAACTGCTG
GTTGTGATCAGCGGCCGGGA
CGTCGGAACGGGAATCCTGG
CACCATTCCCAGCAAATGCC
CCTTTTTGGTGTGGAACCCA
AAATTTTGGACCGCTGTCCT
CATGGCTATAGCAACAATAA
TCGAAACTAGCTGCACTATC
AAAGGGAGCTATAACAGCCA
ATATTCGAGAGTCACATACA
AAAAATTTTGGACCGCTGTC
TTAACACGACGTTCAGCGAA
TCGTATACCGCAGAAGTTCG
CGAACACTAAACAAAGGTGG
AGCAGCAAAAATTTTGGACC
AGTTCGGCAAAGTTAGAGCG
TTCCCAGCAAATGCCTTTCT
ATTCGAGAGTCACATACACA
GTTGTCGTCATGGTTACAAA
GGGGGTAAGGCCATTTGTGC
TTCGGCAAAGTTAGAGCGGG
AACAAAGGTGGTTGTGATCA
TGCTGCAGGGCTCCCGCGCC
TGGTAAGTTACGCATTAACA
CTTGACACGAACACTAAACA
CCATCAAAGGAGAGCCCCTT
AGCGAAGAACGAATACGACA
TTGGCCGAACCAGGTCAAAG
TTTATCTCGTGCATCCAGGA
CTCGGGCTTTAAGGTTAGGG
CGTAGCTTCACAAATGGTCG
TTACGTCACCATTCCCAGCA
GGAAACGACTAGATAGATCC
CGAGGCCCTTGACACGAACA
ACCTCGGGCTTTAAGGTTAG
TCCCACACAGCGTGCCATTC
TTTTTGCCCATTATAATCCA
AGCAAATGCCTTTCTTTAGG
CGATCGTATACCGCAGAAGT

49
chapter3/populate_templates.py

@ -0,0 +1,49 @@ @@ -0,0 +1,49 @@
import jinja2
import os
def main():
# Jinja env
env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'))
problems = [
{
'chapter': '3',
'problem': 'a',
'title': 'Generate k-mer Composition of a String',
'description': 'Given an input string, generate a list of all kmers that are in the input string.',
'url': 'http://rosalind.info/problems/ba3a/'
},
{
'chapter': '3',
'problem': 'b',
'title': 'Reconstruct string from genome path',
'description': 'Reconstruct a string from its genome path, i.e., sequential fragments of overlapping DNA.',
'url': 'http://rosalind.info/problems/ba3b/'
},
{
'chapter': '3',
'problem': 'c',
'title': 'Construct the overlap graph of a set of k-mers',
'description': 'Given a set of overlapping k-mers, construct the overlap graph and print a sorted adjacency matrix',
'url': 'http://rosalind.info/problems/ba3c/'
},
]
print("Writing problem boilerplate code")
t = 'template.go.j2'
for problem in problems:
contents = env.get_template(t).render(**problem)
fname = 'ba'+problem['chapter']+problem['problem']+'.go'
if not os.path.exists(fname):
print("Writing to file %s..."%(fname))
with open(fname,'w') as f:
f.write(contents)
else:
print("File %s already exists, skipping..."%(fname))
print("Done")
if __name__=="__main__":
main()

49
chapter3/template.go.j2

@ -0,0 +1,49 @@ @@ -0,0 +1,49 @@
package rosalindchapter{{chapter}}
import (
"fmt"
"log"
rosa "github.com/charlesreid1/go-rosalind/rosalind"
)
// Print problem description for Rosalind.info
// Problem BA{{chapter}}{{problem}}: {{title}}
func BA{{chapter}}{{problem}}Description() {
description := []string{
"-----------------------------------------",
"Rosalind: Problem BA{{chapter}}{{problem}}:",
"{{title}}",
"",
"{{description}}",
"",
"URL: {{url}}",
"",
}
for _, line := range description {
fmt.Println(line)
}
}
// Run the problem
func BA{{chapter}}{{problem}}(filename string) {
BA{{chapter}}{{problem}}Description()
// Read the contents of the input file
// into a single string
lines, err := rosa.ReadLines(filename)
if err != nil {
log.Fatalf("rosa.ReadLines: %v", err)
}
//// Input file contents
//input := lines[0]
//params := lines[1]
//result := rosa.PatternCount(input, pattern)
//
//fmt.Println("")
//fmt.Printf("Computed result from input file: %s\n", filename)
//fmt.Println(result)
}

4
rosalind/Readme.md

@ -0,0 +1,4 @@ @@ -0,0 +1,4 @@
# rosalind go package
This directory contains the `rosalind` Go package.

0
chapter01/data/clump_finding.txt → rosalind/data/clump_finding.txt

0
chapter01/data/frequent_words.txt → rosalind/data/frequent_words.txt

5
rosalind/data/frequent_words_mismatch.txt

@ -0,0 +1,5 @@ @@ -0,0 +1,5 @@
Input:
CACAGTAGGCGCCGGCACACACAGCCCCGGGCCCCGGGCCGCCCCGGGCCGGCGGCCGCCGGCGCCGGCACACCGGCACAGCCGTACCGGCACAGTAGTACCGGCCGGCCGGCACACCGGCACACCGGGTACACACCGGGGCGCACACACAGGCGGGCGCCGGGCCCCGGGCCGTACCGGGCCGCCGGCGGCCCACAGGCGCCGGCACAGTACCGGCACACACAGTAGCCCACACACAGGCGGGCGGTAGCCGGCGCACACACACACAGTAGGCGCACAGCCGCCCACACACACCGGCCGGCCGGCACAGGCGGGCGGGCGCACACACACCGGCACAGTAGTAGGCGGCCGGCGCACAGCC
10 2
Output:
GCACACAGAC GCGCACACAC

5
rosalind/data/frequent_words_mismatch_complements.txt

@ -0,0 +1,5 @@ @@ -0,0 +1,5 @@
Input
CTTGCCGGCGCCGATTATACGATCGCGGCCGCTTGCCTTCTTTATAATGCATCGGCGCCGCGATCTTGCTATATACGTACGCTTCGCTTGCATCTTGCGCGCATTACGTACTTATCGATTACTTATCTTCGATGCCGGCCGGCATATGCCGCTTTAGCATCGATCGATCGTACTTTACGCGTATAGCCGCTTCGCTTGCCGTACGCGATGCTAGCATATGCTAGCGCTAATTACTTAT
9 3
Output
AGCGCCGCT AGCGGCGCT

4979
rosalind/data/genome_path_string.txt

File diff suppressed because one or more lines are too long

0
chapter01/data/hamming_distance.txt → rosalind/data/hamming_distance.txt

0
chapter01/data/minimum_skew.txt → rosalind/data/minimum_skew.txt

10
rosalind/data/motif_enumeration.txt

@ -0,0 +1,10 @@ @@ -0,0 +1,10 @@
Input
5 2
TCTGAGCTTGCGTTATTTTTAGACC
GTTTGACGGGAACCCGACGCCTATA
TTTTAGATTTCCTCAGTCCACTATA
CTTACAATTTCGTTATTTATCTAAT
CAGTAGGAATAGCCACTTTGTTGTA
AAATCCATTAAGGAAAGACGACCGT
Output
AAACT AAATC AACAC AACAT AACCT AACTA AACTC AACTG AACTT AAGAA AAGCT AAGGT AAGTC AATAC AATAT AATCC AATCT AATGC AATTC AATTG ACAAC ACACA ACACC ACACG ACACT ACAGA ACAGC ACATC ACATG ACCAT ACCCT ACCGT ACCTA ACCTC ACCTG ACCTT ACGAC ACGAG ACGAT ACGCT ACGGT ACGTC ACGTT ACTAA ACTAG ACTAT ACTCA ACTCC ACTCG ACTCT ACTGA ACTGC ACTGT ACTTA ACTTC ACTTT AGAAA AGAAC AGAAG AGAAT AGACA AGACT AGATA AGATC AGCAT AGCCA AGCGT AGCTA AGCTC AGCTG AGCTT AGGAT AGGTA AGGTC AGTAA AGTAC AGTAT AGTCC AGTCG AGTCT AGTGA AGTTG ATAAA ATAAC ATACA ATACC ATAGA ATATA ATATC ATATG ATATT ATCAG ATCCC ATCCG ATCCT ATCGA ATCGC ATCTA ATCTC ATCTG ATGAC ATGAT ATGCA ATGCC ATGGA ATGGC ATGTA ATGTC ATTAA ATTAC ATTAG ATTAT ATTCA ATTCC ATTCG ATTGA ATTGC ATTGG ATTGT ATTTA ATTTC ATTTG ATTTT CAAAG CAACC CAACT CAAGA CAAGC CAATA CAATT CACAC CACAG CACCT CACGT CACTA CACTT CAGAA CAGAC CAGAT CAGGT CAGTA CAGTC CATAA CATAC CATAG CATAT CATCC CATCT CATGA CATGT CATTA CATTG CATTT CCAAG CCATA CCATG CCATT CCCGT CCCTA CCCTT CCGAA CCGAC CCGAT CCGCT CCGGT CCGTA CCGTC CCGTG CCGTT CCTAC CCTAT CCTCA CCTCC CCTTA CCTTC CCTTG CCTTT CGAAA CGAAG CGACA CGACT CGAGT CGATA CGATG CGATT CGCAA CGCAT CGCCA CGCGA CGCTA CGCTC CGCTT CGGAC CGGAT CGGCA CGGTA CGGTC CGGTT CGTAA CGTAC CGTCA CGTCG CGTCT CGTTA CGTTT CTAAC CTAAG CTAAT CTACA CTACC CTACG CTACT CTAGA CTAGC CTAGG CTAGT CTATA CTATC CTATG CTATT CTCAT CTCCG CTCGT CTCTA CTCTT CTGAA CTGAG CTGCA CTGCC CTGTA CTGTT CTTAA CTTAC CTTAG CTTAT CTTCA CTTGA CTTTA CTTTC CTTTG CTTTT GAAAT GAACA GAACT GAAGT GAATG GAATT GACAC GACAT GACCA GACCT GACGT GACTT GAGAA GAGAT GAGCT GATAA GATAC GATAG GATAT GATCA GATCC GATCG GATCT GATGT GATTA GATTC GATTG GATTT GCAAT GCACT GCATC GCATT GCCAT GCCGT GCCTA GCCTT GCGAT GCGGT GCGTC GCGTT GCTAA GCTAC GCTAG GCTAT GCTGA GCTGT GCTTA GCTTT GGAAT GGACA GGATA GGATC GGATT GGCTA GGGAT GGTAC GGTAG GGTAT GGTCA GGTCG GGTTA GTAAA GTAAG GTACA GTACC GTACG GTAGA GTATA GTATC GTATG GTATT GTCAA GTCAG GTCCG GTCCT GTCGA GTCGC GTCGT GTCTA GTCTG GTGAA GTGAG GTGCA GTGCG GTTAA GTTAC GTTAG GTTAT GTTCA GTTCC GTTCG GTTGA GTTTA TAAAC TAAAG TAACA TAACC TAACT TAAGA TAAGC TAATA TAATC TACAC TACAG TACCC TACCG TACCT TACGA TACGC TACGT TACTA TACTC TACTG TAGAA TAGAC TAGAG TAGAT TAGCC TAGCG TAGGA TAGTC TATAA TATAC TATAT TATCA TATCC TATCG TATGA TATGC TATGG TATGT TATTA TATTG TCAAC TCAAT TCACC TCACG TCACT TCAGA TCATA TCATG TCCAA TCCAC TCCAG TCCAT TCCCA TCCCT TCCGA TCCGC TCCGT TCCTA TCCTG TCCTT TCGAA TCGAC TCGAT TCGCC TCGCT TCGGA TCGGC TCGGG TCGGT TCGTC TCTAC TCTAG TCTAT TCTCC TCTCT TCTGG TCTGT TCTTA TCTTT TGAAA TGAAC TGAAT TGACA TGACC TGACT TGAGA TGAGC TGAGT TGATA TGATC TGATG TGATT TGCAA TGCAC TGCAG TGCAT TGCCA TGCCG TGCCT TGCGA TGCGT TGCTT TGGAA TGGAT TGGTA TGTAA TGTAG TGTAT TGTCC TGTCG TGTGG TGTTA TTAAA TTAAC TTAAG TTAAT TTACA TTACC TTACG TTACT TTAGA TTAGC TTAGG TTAGT TTATA TTATC TTATG TTATT TTCAA TTCAC TTCAT TTCCA TTCCC TTCCT TTCGA TTCGG TTCGT TTCTA TTCTG TTGAA TTGAC TTGAG TTGAT TTGCA TTGCG TTGGA TTGGG TTGTG TTTAA TTTAC TTTAG TTTAT TTTCA TTTCC TTTCG TTTGA TTTGG TTTTA TTTTG

2624
rosalind/data/neighbors.txt

File diff suppressed because it is too large Load Diff

5
rosalind/data/number_to_pattern.txt

@ -0,0 +1,5 @@ @@ -0,0 +1,5 @@
Input
5353
7
Output
CCATGGC

19953
rosalind/data/overlap_graph.txt

File diff suppressed because it is too large Load Diff

0
chapter01/data/pattern_count.txt → rosalind/data/pattern_count.txt

0
chapter01/data/pattern_matching.txt → rosalind/data/pattern_matching.txt

4
rosalind/data/pattern_to_number.txt

@ -0,0 +1,4 @@ @@ -0,0 +1,4 @@
Input
CTTCTCACGTACAACAAAATC
Output
2161555804173

0
chapter01/data/reverse_complement.txt → rosalind/data/reverse_complement.txt

9517
rosalind/data/string_composition.txt

File diff suppressed because one or more lines are too long

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save